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ABSTRACT 



In today's organizations, information in current databases is stored in a 
variety of heterogeneous systems and data organizations. This situation 
causes problems when trying to integrate them into a federated or multi- 
database solution. Particularly troublesome is semantic conflict, or differ- 
ences in the meanings of data structxires and definitions in heterogeneous 
databases. This thesis proposes a systematic approach towsurds identif3dng, 
classifying and resolving semantic conflicts. Using an entity relationship ap- 
proach as the integrating model, a framework is developed which describes 
all possible semantic conflicts among the imderlying schemas. This frame- 
work can be employed as a methodological tool during an integration effort. 
Possible resolution strategies are offered for each type of conflict and applied 
to the separate databases to realize a common global schema which could be 
used to formulate effective queries against the total original volume of data. 
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I. B^RODUCTION 



A. BACKGROUND 

Organizations have over time developed many disparate databases to 
manage information. These databases have been implemented using a wide 
variety of incompatible models, languages and storage methods. Migration of 
database systems to an integrated strategic Information Resource based 
architecture will require the interoperability of these diverse sources of data. 
Conflicts among these heterogeneous databases will impede consolidation 
efforts. 

This problem can be addressed in the near term by transforming the 
schemas of incompatible data orgeinizations, such as hierarchical and rela- 
tional, into a common data model which will capture all information con- 
tained in the original databases and make it available to the user in a unified 
form. The Enhanced Entity-Relationship model, which is both semantically 
rich and conceptually simple, can serve as an integrating model for combining 
the data from different databases. 

With the independent databases represented in equivalent schemas, a 
framework for the identification, classification, and resolution of semantic 
data conflicts can be developed. The integrated global schema can guide the 
formulation of queries, and the detailed understEmding of semEintic conflicts 
among the component databases resulting from the re-engineering process be 
incorporated in the design of a global controller which can manage the 
retrieval of information from a federated database application. Users 
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requiring access to data from several disjoint databases c£in then process 
queries against the reconciled common schema. 

In the longer term, data element stemdardization efforts may obviate 
many of the semantic conflicts addressed by this thesis. However, different 
preferred forms of organizing corporate information will remain specific to 
various functional domains. Tools for the integration of data from heteroge- 
neous databases will still be required. 

B. OBJECTIVES 

The objective of this thesis is to develop a framework for identifying, 
classif 3 dng, and resolving semantic conflicts using the Enhanced Entity- 
Relationship model. This includes transforming heterogeneous databases into 
a common schema for comparison and identification of semantic conflicts, 
illustrating all possible forms of semantic conflict, both at the schema and at 
the data level. Using real world examples, the classification framework will 
be applied to diverse database applications in the course of am integration 
effort. Finally, this analysis will suggest, in general terms, resolutions to the 
various semantic conflicts identified through the use of the framework. Inte- 
gration of the component databases into a global schema and design guide- 
lines for the implementation of a global controller completes the objectives. 

C. RESEARCH QUESTIONS 

The following research questions are addressed in this thesis: 

(1) What qualities are needed in an integrating data model to integrate 
data from mvdtiple sources? 

(2) What types of semantic data conflicts arise in heterogeneous data- 
bases, and what is an appropriate framework for classifying 
semantic conflicts? 
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(3) How can semantic data conflicts best be identified and resolved to 
allow integrated access to corporate information stored in databases 
using different data models, definitions, and constraints? 

(4) How might semantic conflicts be resolved to allow the formation of a 
common global schema incorporating heterogeneous databases 
which use different data models, definitions, and constraints, and 
what guidance can the re-engineering process give toward the 
design of a global controller component for a federated database 
application? 

D. SCOPE AND LIMITATIONS 

This thesis will review the potential semantic data conflicts which can 
arise in heterogeneous databases, and develop a framework for classifying 
these conflicts. A common data model for use in integrating diverse sources of 
data is examined and evaluated for appropriate qualities. General measures 
to resolve these conflicts with the aim of integrating the data and useful in 
the design of a functional federated database controller will be examined. 

This thesis will not address heterogeneity at the platform or Database 
Management System level. Real world database specifications from Depart- 
ment of Defense users will be analyzed at the level of descriptive detail 
obtained. Reasonable assumptions will be made (in terms of detailed data 
definitions, etc.), where necessary, to illustrate the types of semantic conflict 
under discussion. 

E. METHODOLOGY 

The methodologies used in this thesis include: 

(1) Obtaining database specifications for several real world applications 
from the same functional domain (i.e., administrative personnel 
management). 

(2) Examining an appropriate common integrating conceptual data 
model for integration of diverse databases. 
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(3) Transforming the separate databases into equivalent schemas, 
using the conceptual integrating data model. 

(4) Analyzing and comparing the equivalent schemas to develop a 
framework for identifying and classifying all possible semantic 
conflicts. 

(5) Exploring possible solutions to the conflicts identified, and using 
the firamework and resolution heuristics to integrate a global 
schema which subsumes all available data from the candidate 
databases. 

(6) Using the knowledge of semantic conflicts gained to suggest design 
strategies for a global controller component to manage a federated 
database including the subject databases. 

(7) Reviewing the experience of the integration process to suggest 
future areas of research into useful techniques for resolving seman- 
tic heterogeneity. 



F. ORGANIZATION 

The thesis is organized in the following manner: 

Chapter II addresses the proliferation of heterogeneous databases in 
organizations. This includes an analysis of various levels of heterogeneity, 
and suggests sources of different kinds of conflicts. 

Chapter III reviews the required quaUties of a suitable integrating 
model, with particular mention of the various types of existing databases 
which might have to be modeled. The common conceptual model used in this 
thesis is explained, and examples are given of the diagrammatic conventions 
used in following chapters. 

Chapter IV presents a real world scenario of heterogeneous databases 
drawing on specifications obtained firom various Department of Defense fimc- 
tional applications. The federated database approach to integration is des- 
cribed, including the role of the global controller component which manages 
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the resolution of semantic conflicts at the functional level. Each database is 
transformed into a common equivalent schema using the integrating model. 

In Chapter V, the equivalent schemas developed in Chapter VI are 
systematically compared to form a classification freunework of semantic het- 
erogeneity. Examples of each type of semantic conflict are illustrated and dis- 
cussed based on the specifications detailed in the appendices. 

Chapter VI explores in general terms possible means of resolving each 
type of semantic conflict expressed in the classification framework. The pro- 
posed solutions are then applied to the individual schemas to create a com- 
mon global schema which includes all information originally available. 
Additionally, this chapter applies the semantic conflict ftmnework to theoret- 
ical design considerations of a federated database approach to integration. 

The concluding chapter reviews lessons learned in the course of integrat- 
ing real world heterogeneous databases, and offers conclusions about identify- 
ing and resolving semantic conflicts between databases. Recommendations 
and suggested areas of future research are offered based on the results of this 
analysis. 
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II. PROLIFERATION OF HETEROGENEOUS DATABASES 

IN ORGANIZATIONS 



A. HETEROGENEITY IN DATABASES 

In a perfect world, the advantages of interoperability would motivate 
end users, designers, and developers to ensure that seamless and effective 
information sharing were built into database applications from the ground 
up. Still, heterogeneous databases have proliferated throughout organiza- 
tions for a variety of reasons. A (largely) homogeneous paradigm woxild be 
practical for an organization entering the database field from a standing 
start, with access to the full spectrum of currently available technology. 
Gradual evolution, however, has resulted in the current situation. 
Organizations such as DoD have continuously developed database applica- 
tions over 40 years. Recurring cycles of hardware, software, and technology 
during that time have all contributed to the diversity of databases in use 
today. In addition to these essentially technical issues, the incremental, 
disjoint, and airbitrEuy implementation of conceptual design methodologies 
has contributed to the present chaotic assortment of incompatible systems. 

This evolution has resulted in a database environment with three levels 
of heterogeneity. At the lowest level, different database applications are 
implemented on a wide range of hardware platforms. Similar hardware can 
run a variety of operating systems. Distributed databases must communicate, 
using compatible communications protocols. Variation in these protocols 
introduces more conflict. At the next level. Data Base Management Systems 
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(DBMSs) may be incompatible, even when intended to work with similar data 
structures. Finally, when data is named, defined and organized into a 
particular architecture, subjective design choices introduce fundamental and 
potentially intractable semantic conflicts. Platform and DBMS heterogeneities 
are discussed in section one, while semantic heterogeneity is discussed in 
section two. 

This thesis deals with semantic conflicts which arise at the schema, or 
architectural level of database organization, and at the data level. Such con- 
flict arises from both technical and methodological causes. Incidental hetero- 
geneity issues, caused by hardware, operating system, DBMS software, and 
communications protocol variations are not addressed. 

1. Platform and DBMS Heterogeneity 

When information was first stored for electronic manipulation by 
computer systems independent from the specific application programs doing 
the manipulation, it was organized as 'flat files'. These were simple, identi- 
cally formatted records, accessed by the application program through the 
program's explicit knowledge of where in the record a given fact could be 
found. No attempt was made to make associations between individual 
records, since each was treated by the application as a unique piece of 
information. Additions to the set of records was therefore easy, but a change 
in the structure of a record very diflicult, since the entire application had to 
be rewritten to preserve the necessary explicit internal map of the record's 
structure. 

Initial interest in database research centered on the management of 
data in business applications such as automated payroll, inventory, and 
transaction processing. These domains reqtaired efficiency in accessing and 
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modifying very large amounts of data, and were oriented toward well defined, 
repetitive processes which could be run from start to completion in a batch 
mode. Additionally, these first databases appeared when the physical limit- 
ations of the available hardware imposed very definite restrictions on the 
architectiire which could be used to organize the data. 

These factors influenced the hierarchical, or tree-based approach to 
data management. Data records are assembled into a collection of trees, some 
being root records, and all others having a unique parent record. This organi- 
zation is amenable to the simple relationships of employee to wage, tax code, 
dependents, etc. in a payroll scenario, or the assembly to subassemblies to 
parts relation of an inventory. Since the processing is repetitive, and need not 
be done in real time, hierarchic database programs can be optimized to 
navigate through the tree structure even when this is highly complex. 
Finally, the hierarchic data model was suited to magnetic tape storage, an 
economic requirement before random access disk-based storage became 
affordable. 

Evolutionary modification of the hierarchic data architecture led to 
the Conference on Data Systems Languages (CODASYL) standard. This 
arrangement allows more complex, and thus more useful, relationships be- 
tween data elements to be represented, with records arranged into a directed 
graph or network. Efficient implementation of the network organization both 
required and exploited the more flexible capabilities of direct access storage 
media. Disks rapidly replaced tape as their cost-performance ratio improved. 
Application programming for a network database requires a highly proce- 
dural navigation oriented language, like the hierarchic model, which restricts 
the degree of dynamic processing changes available to the end user. 
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The relational data model was pioneered in the early nineteen 
seventies and offered a fundamentally different approach to data storage. 
Data is represented as simple tabulsur structures (relations), and access is 
allowed through a high level, non procedxiral query language. The complexity 
of relationships between data elements is unrestricted. The application 
programmer, or end user, specifies a predicate which identifies the desired 
record or combination of records, and the DBMS translates that specification 
into an efficient algorithm which performs the database access. 

Even the most advanced relational models are not without draw- 
backs, however. The computational complexity of solving queries involving 
multiple large relations can be prohibitive, and much research has gone into 
the optimization of relational queries. Efficient design, or normalization, of 
the relations themselves to eliminate redundancy and logical anomalies has 
also required theoretical advances. New approaches to allocation and 
management of disk space and memory buffering routines have been neces- 
sary to minimize storage cost and access delays. While the relational data 
model provides the maximum flexibility in orgemizing and manipiilating data 
in the early nineties, it does so at some cost. 

The evolution of theoretical work on data storage and processing, 
and the hardware development which facilitated and paralleled it represent 
the technical factors which lead to heterogeneity in databases. As applica- 
tions were developed and brought into production, organizational pressures 
prohibited continuous re-engineering of applications to exploit each new theo- 
retical or hardware development, even where that was appropriate. It must 
be kept in mind that some degree of heterogeneity in organizational 
databases is not an aberration which can, or even should, be completely 
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eliminated. A relational database offers the flexibility to deal with arbitrarily 
complex, unstructured queries on an ad hoc basis, but its computational 
overhead does not recommend it for a mature inventory system. When update 
processing requirements are relatively static and well understood, transac- 
tions against the inventory can be done periodically, and the number of 
records is very large, a hierarchic database is a better choice. Over the range 
of organizational activities within the DoD, various problem domain solutions 
will naturally fell to diverse appropriate data model/hardware combinations. 

2. Semantic Heterogeneity 

A separate causal factor leads to semantic data conflicts. Generally, 
these arise from variations in database design methodology and implementa- 
tion. The technical factors discussed above concern physical application level 
strategies and models. The hierarchic, network and relational data architec- 
tures deal with how individual data elements are organized, physically 
linked, retrieved, and manipulated by the hardware and software of an appli- 
cation. As suggested, technical issues lead to a natural, unavoidable diversity 
in organizational databases, based primarily on processing efficiency within 
particular problem domains. Methodological factors, on the other hand, result 
from human individuality, differences in perception, and preferences. They 
give rise to heterogeneity between databases addressing the same functional 
application, using identical hardware, operating systems, data models, and 
DBMS software. Because data definition, naming conventions, and conceptual 
organization are inherently subjective issues, semantic conflicts are almost 
guaranteed amongst databases developed by different teams in the absence of 
strictly enforced strategic design guidelines. 
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Individuals interpret the world from their own personed perspec- 
tive. Organizations, and subdivisions of organizations, have similarly diverse 
views of their environment. Items of interest, which become data elements, 
aggregations of elements, which become records and logical entities in organi- 
zational databases, are named, defined, and organized in this qualitative, 
subjective, environment. If two departments of the same company imdertook 
to develop personnel databases, without specific guidance from the front of- 
fice, it would not be surprising to find different naunes for similar employee 
attributes, identical field definitions for contradictory elements, or even com- 
pletely different ways of structuring the problem. This is a predictable and 
imsurprising consequence of individual and organizational differences. It is 
germane that the type of conflict described could, and would, arise even if 
central guidance was provided, but was restricted to mandating a particular 
hardware/DBMS suite. 

Yet this is exactly how many organizations, including the DoD, 
have developed their database applications over the last forty years. Until 
very recently, only particular hardware, operating systems, or DBMSs have 
been standardized among the services and their various departments. There 
was still no strategic guidance which provided common definitions, naming 
conventions, etc. at the element or entity level. Thus even if DBMS/platform 
conflicts do not arise, semantic conflicts remain which can make databases 
incompatible. 
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B. THE NEED FOR INTEGRATION OF HETEROGENEOUS 

DATABASES 

As organizations mature in the use of information technology, the 
potential benefits of consolidating heterogeneous databases become irre- 
sistible. Vital corporate information is captured, stored, and available to 
decision makers and operational functions from many database applications, 
but incompatibilities can prevent the integration of data from different 
sources. Elimination of data redimdancy, to achieve cost advantages, means 
more applications must share compatible data. Data accuracy, critical for 
high-risk decisions, can be enhanced by identifying disjoint data among 
similar databases and resolving the semantic conflicts. 

The need for standardization of data management has been recognized 
by the DoD and forms a central part of the Corporate Information Manage- 
ment (CIM) initiatives. Current data dictionary efforts, which address the 
problem of semantic data discrepancies at their lowest level, hold promise for 
ameliorating the problem in future applications. There is also an urgent need 
for high level methods to allow the integration of information in currently 
existing heterogeneous databases. 

Two approaches have been identified which address this issue. The 
multidatabase approach leaves the component databases in their native form, 
but provides transparent access to all included information. Users are aware 
that they are dealing with multiple diverse databases, both schematically, 
and physically. Alternatively, the federated approach consolidates the 
component databases under a global schema, and gives both location and 
heterogeneity transparency. Users interact with the data as though it were in 
a single, physically contiguous, logically consistent database. Either approach 
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requires a strong logical data model to describe multiple individual physical 
data architectures. The next chapter addresses a suitable integrating model. 

Once all databases of concern have been expressed in a common concep- 
tual schema, semantic conflicts among individual data elements can be easily 
identified. Chapter IV presents three heterogeneous real-world databases and 
describes the process of transforming them into equivalent schemas in the 
common integrating model. Chapter V develops a firamework of semantic het- 
erogeneity for the integrating model. The framework enables the classifica- 
tion of semantic data conflicts stemming from hvunan variation in method- 
ological implementation. With a comprehensive integrating model, and a tax- 
onomy for identifying semantic heterogeneity which includes, schematic and 
data conflicts, possible solutions can be proposed. This is the subject of 
Chapter VI. Conflicts in architecture and data organization which arise at the 
Platform/DBMS level are properly addressed by the detailed implementation 
of the integration effort. The resulting consolidated, reconciled information 
can be accessed through appropriate systems to provide organization-wide 
use of existing heterogeneous databases. 
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m. THE ENHANCED ENTITY RELATIONSHIP MODEL 



A. DATA MODELING 

When a database application is developed, the segment of the real world 
to be modeled is analyzed in light of the users’ requirements. The designers 
make a choice about the conceptual data model to be used. The choice of 
model is governed by its perceived appropriateness to the problem domain, 
the personsd preference of the designers and their familiarity with various 
methodologies. Conceptual modeling is done at a very general level of 
ansdysis, and has only marginal impact on implementation decisions. The 
data elements and arrangements suggested by the analysis must then be 
formally specified, and their structure and behavior defined in terms of the 
logical model. The ultimate physicad organization of data (in a network, rela- 
tions, etc.) is independent of the logical schema used for design, and is chosen 
as a function of processing, access requirements, transaction frequency, and 
the structure of the resulting schema. 

In considering heterogeneous databases with a view toward information 
sharing and consolidating access, the original logical design is often unavail- 
able, and the conceptual model used unknown. The final application architec- 
ture may provide no indication of the conceptual scheme used in the initial 
analysis. A logical integrating model which can describe multiple diverse 
implementation models is needed to subsiune the heterogeneous component 
databases and allow them to be expressed in a consistent schema. 
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Of the potential candidates for an integrating conceptual model, the 
Entity-Relationship (ER) approach stands out as a strong candidate. It is 
semantically rich, conceptually simple, and can capture arbitrarily compli- 
cated relationships between atomic elements and larger groupings of infor- 
mation. It is widely used in database design [Ref. 1], and offers a natural and 
intuitively understandable way of displaying information and real world rela- 
tionships. With the additional semantic expressiveness provided by exten- 
sions to the ER model (referred to as Enhanced Entity Relation, or EER), 
newly popular concepts such as inheritance can also be defined. 

Although sophisticated renditions of EER schema become diagrammati- 
cally complex, the essential representation of atomic data elements as con- 
nected attributes which describe an entity, or real world item of interest, is 
ftmdamental. Relationships between entities, and the characteristics of the 
relations (cardinality, mandatory participation, etc.), are explicitly defined 
and represented by the model, making it simple to visually interpret an ER 
schema. The ER/EER data model is one of the most widely used logical 
schemes for conceptual database design [Ref. 2]. This wide acceptance, as 
well as its superior descriptive qualities, make it the most appropriate 
integrating model. 

1. Top Down Modeling 

In top down database modeling, the user’ s real world, or the portion 
of interest, is analyzed in terms of data requirements and relationships. 
Appropriate data types are defined, and the information is arranged in logical 
groupings which meet the users’ needs. At this level of modeling, no imple- 
mentation details are considered, and the resulting schema is easy to 
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understand and verify with non-technic2d users. The basic tool for this 
process is the conceptual data model. 

For example, consider the design of a database to organize informa- 
tion about officer personnel for the Depeurtment of the Navy. The user has 
specified that the information of interest includes basic data, such as name, 
rank and serial number, as well as the officer’s duty assignment. The 
designer, using the EER technique, takes these requirements, and arranges a 
conceptual schema which represents the officer as em entity, defined by the 
attributes of name, rank, and social security number. Likewise, the unit he or 
she is assigned to is shown as an entity, defined by a unit identification code 
attribute. The relationship between the officer and the unit is also 
represented. 

The user also provides specifications about appropriate data types 
for various elements. Name might be most usefully defined as a character 
string, while rank is desired to be represented by some arbitrary code which 
fits into the user’s overall information processing philosophy. At this point, 
uniquely defining, or key, attributes are defined for entities where possible 
confusion could exist between two sets of information. This could occur if two 
officers had identicsd names and ranks. When the conceptual schema is 
complete, the user confirms that the information and arremgement meets the 
database requirements, and implementation proceeds through the mapping 
of the conceptued schema to a DBMS, and design of physical data storage 
structures. 

2. Bottom Up Modeling 

The use of conceptual design techniques such as the EER model in a 
bottom up manner differs in that the purpose is not to capture a suitable 
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schema from real world information. Instead, the intent is to reverse engineer 
a conceptual schema from an existing database implementation. Data types, 
file structures and attribute definitions have already been designed and 
implemented. Transforming the low level database implementation 
specifications back into a high level conceptual schema allows analysis of the 
choices made in arranging the original data requirements. 

More important, bottom up data modeling can render completely 
different database implementations in equivalent form for comparison and 
interpretation. This is the main thrust of reverse conceptual modeling in this 
thesis. The EER model is semantically rich enough to conceptually represent 
many existing database implementation. The EER model will be used in 
Chapter IV as a common model to transform diverse heterogeneous databases 
into equivalent schema’s, for analysis of potential semantic conflicts. The 
following sections present the EER concepts and the diagrammatic 
conventions used in this thesis. The specific EER model used throughout this 
thesis is taken from [Ref. 3]. The closing section of this chapter briefly 
describes the application of the EER modeHng concepts to bottom up analysis 
of various different database implementation types. 

B. EER CONCEPTS 

The Enhanced Entity Relationship model is essentially very simple. 
Information is represented by entities, which £ire described by attributes, and 
associated to each other by various kinds of relationships. With intuitive 
extensions of these three simple concepts, arbitrarily complex views of the 
real world can be expressed in a graphic and easily imderstood way. Since the 
use of the model here is not to capture a top down schema from beginning 
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user requirements, not all of the semantic expressiveness available will be 
described. For a complete examination, the reader is directed to [Ref. 3]. 

1. Entities 

The central object in the EER model is the entity, which represents 
a real world ‘thing’ with independent existence. It may be something with 
physical existence, such as an officer, or a concept, like a security cle£ir2mce. 
Entities are described by properties, which are real world facts about an 
entity. They are also associated with other entities to capture additioned 
information. 

Entities can be unique, and independently defined, or they can be 
dependent on the existence of another entity. Such entities are referred to as 
‘weak’. A security clearance entity is an example of a weak entity, since in the 
real world, it doesn’t make sense to think of that entity without a related 
officer, who holds the clearance. Weak entities have their own attributes, and 
represent important real world concepts, but must be associated with an 
identifying owner to have meaningful semantic content. 

Figure 1 illustrates an entity t 5 npe. 



MEMBER 

Figure 1. Entity Type 

Figure 2 is a weak entity type, having no useful semantic content 
without em identifying relationship. 

SCRTY-CLEAR 

Figure 2. Weak Entity Type 
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2. Attributes 



Descriptive facts about entities are called attributes. They can be 
simple, single valued attributes, such as a social security number, or they 
might be multivalued, or even made up of other attributes. Composite attri- 
butes make it possible to represent data which may be handled as a whole 
sometimes, but in part at others. An officer’s name might be a composite 
attribute, if it is used in full (Last, First, Middle) in some instances, and 
sometimes in part (Last only). 

A critical attribute concept is that of the key. A key attribute is one 
which uniquely defines the entity it describes. This allows distinguishing 
between instances of an entity type for which all other attributes are identi- 
cal. Social security number is a very common key attribute. A related concept 
is that of the partial key. A parti£d key attribute uniquely describes a weak 
entity when concatenated with the key of the weak entity's identif 3 dng owner. 

A final very useful attribute type, is the derived attribute. This rep- 
resents information which is not explicitly captured in the database, but may 
be determined, or cedculated, from related information. The total number of 
officers assigned to a unit, for instance, could be calculated from the number 
of related officer entities for each instance of the unit entity. TotsJ number 
assigned could then be assigned to the ^lnit entity as a derived attribute. 

Figure 3 shows how a simple attribute is depicted graphically. 

^rWPMBER-NAMO 

Figure 3. Attribute 
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Figure 4 depicts a key attribute, auid a partial key attribute. 




Key Attribute Partial Key Attribute 

Figure 4. Key Attribute and Partial Key Attribute 



Figure 5 illustrates a multivalued attribute (£ui attribute with a 
single meaning, for which an entity might have multiple instances). 



AWARDJ 



Figure 5. Multivalued Attribute 



Figure 6 represents a composite attribute. 



,^ CTEMB£R-NAME^;y^ 
DsmofT 



Figure 6. Composite Attribute 



Figure 7 shows how a derived attribute is diagrammed. 



Figure 7. Derived Attribute 
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Figure 8 shows a parti8d completed entity with its descriptive 
attributes. 




Figure 8. Member Entity 

Figure 9 represents a weak entity with its partial key, 

SCRTY-CLEAR 

Figure 9. SCRTY-CLEAR Weak Entity 

3. Relationships 

The third basic concept in EER modeling is the relationship. This is 
used to represent associations of varying t 3 npes between entities. An officer, 
for example, could be related to a unit by the relationship ‘Assigned To’. The 
completed schema then makes it explicit that an officer is assigned to a unit, 
by connecting the two entities with a relationship. Weak entities are associ- 
ated with their identifying owners by an identifying relationship. 

Relationships can capture a very large range of semantic meaning 
by the addition of relationship cardinality. Cardinality refers to constraints 
on the relationship. In other words, if every officer is assigned to one and only 
one imit, this is defined in the EER schema by adding a cardinafity number to 
the relationship in the direction from the officer to the unit. Units, logically, 
would have many officers assigned, and this would be represented by an 
appropriate cardinality in the relationship direction from the vmit to the 
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officer. The graphical conventions used to depict cardinality will be shown in 
the following section, and their usage will be more obvious. 

Additional constraints on relationships are referred to as partial 
and total participation. This can be visualized by considering the weak entity 
example above. Since a security clearance has no semantic meaning without 
an identifying relation to an owner officer entity, security clearance 
participation in that relation must be total. In other words, each and every 
instance of security clearance must participate in the identifying relationship 
with some officer entity, or it cannot exist in the schema. Conversely, it is 
possible to conceptualize a unit, perhaps newly formed, which has no officers 
assigned. This allowable meaning is represented by a partial participation 
relationship. A unit entity is allowed to exist without necessarily partici- 
pating in a relationship with a particular officer. By combining participation 
and cardinality constraints on relationships, any conceivable association of 
entities can be modeled using EER techniques. 

Figure 10 is an example of a simple relationship. 




Figure 11 illustrates the identifying relationship between a weak 
entity and it’s identifying owner. The double diamond aroimd the relationship 
specifies that it is an identifying relationship. The double hne connecting the 
weak entity to the relationship is used to indicate the total peirticipation of 
the weak entity. This is a condition of the identifying relationship, but not 
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restricted to this relationship type. Any relationship type can be constrained 
on either side by total participation. 




Figiare 11. Identif 5 nng Relationship of SCRTY-CLEAR to MEMBER 



Figure 12 gives examples of various cardinality constraints on 
relationships. The cardinalities are read in the direction away from the con- 
strained entity. In other words, for the cardinality label immediately above 
the connecting line, each entity is related to one and only one instance of the 
other entity (one-to-one). Next above shows the ENTITY 1 related to many 
ENTITY 2 (one-to-many) Finally, above the line, is an exsimple of many enti- 
ties on either side related to many entities (many-to-many). 
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Figure 12. Various Cardinalities of Relationships 

Below the Hne are illustrated more complex cardinality constraints. 
These are read identically, in terms of direction, the conceptual extension 
being the range defined in the parentheses. The left nmnber of the ordered 
pair represents the lower bound on participation, and the right number the 
upper bound. Thus, reading the example immediately below the line depicts a 
relationship in which the ENTITY 1 must be related to at least one, and may 
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be related to any number of ENTITY 2 Conversely, ENTITY 2 is related to one 
Eind only one ENTITY 1. The remaining cardinality constraints are read in a 
similar m ann er, and are not exhaustive. 

4. Complex Data Organizations 

Complex and useful data organizations such as those becoming 
popular in Object Oriented analysis are represented in the EER model by 
specialized extensions to relationships. Two will be described below. The 
Generalization/Specialization structure, which captiures the concept of inheri- 
tance, and the aggregation structure, which captures the whole-part relation- 
ship concept. 

a. GeneralizationI Specialization 

The Generalization/Specialization (Gen/Spec) relationship is 
used to model a schema of entities, which all posses common attributes as 
part of their description, but which for some subset of entity instances, 
unique attributes define logical subclasses. It is sometimes referred to as an 
IS-A(N) relationship (i.e., the Specialization entity IS-A Generalization entity). 
A simple illustration which expands on those used above is to consider a 
personnel database containing data not only on officers, but all members of a 
given service. For all entities representing service members, a large number 
of attributes, such as name, social security number, etc., will be same. That 
is, all members will possess these attributes. Officer members, however, will 
have different attributes than enlisted members, and it is conceptually ele- 
gant to be able to model this phenomenon explicitly. 

This is done with the Gen/Spec relationship which connects the 
generalized entity member, to the specialized entities enlisted, and officer. 
Thus for a given instance of officer, the full set of defining attributes consists 
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of those belonging to the generalized member, in addition to the specific sub- 
set of attributes which define the specialized officer. The officer instance 
‘inherits’ the attributes of its related member instance. 

Gen/Spec relationships can be extended with various qualifica- 
tions, just as simpler relationships. Two Gen/Spec constraints utilized in this 
thesis are those of total participation of the general entity, and disjointness. 
Total participation represents the semantic concept that each and every 
member of instance of the general entity must belong to one or more of the 
related specialization entity types. If on the other hemd, it were allowable for 
a general entity to exist independently (that is, only possess the generalized 
attributes), the general entit^s participation in the Gen/Spec relationship 
would be partial. 

Disjointness indicates that each specialization entity must 
belong to only one specialization. In the member to enlisted/officer relation- 
ship, disjointness is enforced, since each member must be either an officer, or 
an enlisted. Alternatively, a Gen/Spec relationship in which a specialization 
entity could belong to more than one specialization would be an overlapping 
type. 

Figure 13 diagrams a disjoint Gen/Spec relationship with the 
constraint that each and every MEMBER must belong to either the MEMBER- 
OFR, or the MEMBER-ENL specialization. Disjointness is represented by the 
small ‘d’ in the relationship circle, and total participation of MEMBER by the 
double line connecting MEMBER to the relationship. 
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Figure 13. MEMBER to MEMBER-OFR Generalization/Specialization 
Relationship 

b. Aggregation 

Aggregation is an abstraction concept for building composite 
entities from component entities. This can be thought of as a PART-OF rela- 
tionship. For instance, Army, is PART-OF Department of Defense. This is 
extremely useful in EER modeling of some advanced database applications. 

C. APPLICATION TO REAL WORLD DATABASES 

This section reviews, in general terms, some of the conceptual bottom up 
modeling techniques applicable to transforming existing database implemen- 
tations into equivalent EER schemas. In some ways, reverse conceptuaUzation 
of existing data organizations is simpler than top down design, since many of 
the structural choices have already been made, and may be obvious. On the 
other hand, absent the original design, some arbitrariness is inevitable, and 
it falls to the re-engineering analyst to make logical choices. While 
randonmess of data structuring is a danger of this approach, if consistent 
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criteria £ire used, the purpose of the modeling, which is to allow methodical 
comparison of semantic schemas, will be fulfilled. 

1. Flat Files 

Conceptually, a flat file data orgemization cem be simply rendered 
as a single entity possessing all the attributes defined in its record structiure. 
However, this approach is inelegant, amd loses much of the semantic content 
which is likely represented by the original file record. Thus, repeating groups 
of fields are sought, and extracted as separate entities with appropriate rela- 
tionships to the entity suggested by the major category of the record. 
Similarly, if a record has fields which are utilized for different meanings 
depending on the values of other fields, this suggests that the record actually 
describes a Gen/Spec orgamization, and is so mapped to the EER diagram. 
Accurately recreating the cardinalities of relationships is the most difficult 
part of bottom up modeling, since these constraints, while explicitly repre- 
sented by the EER schema, are in general enforced at the implementation 
level, and often are not included as part of the available database definition. 
Additionally, whether or not a particular entity is ‘weak’, and the choice of 
identifying owner for those which are, may not be obvious. In these cases, the 
re-engineering analyst must make logical and consistent choices based upon 
knowledge of the information domain. 

2. Hierarchic 

Basic, restricted data organizations such as the hierarchic, or tree 
based structure, can be fully described quite simply. The relationship be- 
tween entities is one-to-one/many, and the presence of specific attributes may 
allow the collection of several elements into attributes describing a 
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generalized entity. Layered grouping of entities as ‘children’ of other entities 
requires no extension of the concept. 

3. Network 

The network model builds on the tree architecture, but allows addi- 
tional associations between entities. This arrangement is restricted by the 
condition that a ‘child’ may have only one ‘parent’ of a given tjq)e. The result- 
ing multiple one-to-memy relationships form the network, for which the model 
is named. Like mapping the hierarchic structure to an EER schema, no par- 
ticularly complicated analysis is necessary, other than choosing appropriate 
cardinality of relationships, if this is not expressly defined for the re-engi- 
neering analyst. 

4. Relational 

Relational data structures are not constrained in the complexity of 
connections between data elements and entities. But in modeling the rela- 
tional implementation to an EER schema, most often one may proceed from 
the assiunption of a correspondence between tables and entities Relationships 
are explicitly given by the distribution of foreign keys. Cardinalities may 
have to be inferred, as in the previous cases. 

5. Object Oriented 

Although not widely available, there is great interest in commercial 

0 

database implementations which exploit the use of object oriented design. 
The generalization, aggregation, and inheritance constructs offered by the 
EER model are powerful and desirable data ordering concepts. In the futvire, 
integration of heterogeneous databases including Object Oriented implemen- 
tations will use these descriptive properties of the EER model more fully than 
they are employed here. 
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IV. HETEROGENEOUS DATABASE SCENARIO 



A. THREE OFFICER PERSONNEL ADMINISTRATIVE DATABASES 

Personnel information for the Department of Defense is currently stored 
in a variety of separate and diverse databases. A great wealth of data is avail- 
able, but is maintained by different organizations, using different database 
management systems (DBMS), design philosophies, and hardware platforms. 
Frequently, data that span across several databases need to be retrieved. 
Under the current environment, however, integrating the total information 
presents many difficulties. 

To accomplish the integration, the many conflicts arising between the 
multiple databases must be resolved, to allow global querying of the body of 
data. Platform incompatibility, such as that between diverse Operating Sys- 
tems, manufacturers' physical hardware implementations, etc., is solvable, 
although sometimes at great cost in processing resources. At the level of 
DBMS heterogeneity, programming techniques can be used to translate a 
query appropriate to a relational database into one suitable for searching a 
flat file structure. At the semantic level, conflicts of meaning, and conceptual 
arrangement of data must be reconciled. 

Of the three levels of heterogeneity, semantic conflicts are the most diffi- 
cult to resolve. During the conceptual design of a database application, the 
meaning and structural organization imposed on real world information of 
interest fundamentally influences every subsequent use of that stored data. 
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Even when identiced DBMSs, platforms, operating systems, etc., are consid- 
ered, many conflicts can still arise due to the different meanings assigned to 
the same real world item by different designers. 

This chapter examines three actual administrative databases currently 
in use by various organizations within the Department of Defense to main- 
tain information on commissioned officer personnel. DBMS emd platform dif- 
ferences amongst these databases will be ignored except where these issues 
influence the effort to identify, and classify, semantic conflicts. 

1. Active Duty Military Inventory (ADMI) 

The ADMI database is maintained by the Defense Manpower Data 
Center, and includes data on all active duty military personnel, both officer 
and enlisted. It is a tape-based flat file database, and serves primarily to pro- 
cess batch transactions for various reports of interest to the manpower office 
of the Secreteuy of Defense. Information on Naval commissioned officers is 
therefore available as a subset of the records of the ADMI database. A partial 
database specification for ADMI is presented in Appendix A. While not com- 
plete, in terms of complete data definition, the level of detail available is 
representative of what might actually be available during the course of inte- 
grating a multidatabase application. Reasonable assumptions have been 
made as to exact attribute definitions, in some cases to illustrate a pairticular 
point of potential semantic conflict. 

The ADMI database stores basic information of interest to the per- 
sonnel administration function, such as name, rank, social security number, 
and date of birth, sex, race, etc. It also keeps data on marital status, number 
of dependents, and whether a member's spouse is also a member of the mili- 
tary. In addition to these facts, the ADMI database contains an extensive 
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number of statistical elements concerning a member's status on original 
entry to military service. This includes height, weight, test form number, 
both raw and adjusted scores for the Armed Services Vocational Aptitude 
Battery (ASVAB), and place of entry into the service. 

2. Officer Personnel Information System (OPINS) 

The OPINS database is maintained by the Bureau of Naval Person- 
nel to track commissioned officer assignment, promotion, and qualification 
status. Like the ADMI, it is a flat file database, and it theoretically contains 
the entire population of interest for this scenario. A partial specification is 
presented in Appendix B. Similar assumptions as to attribute definitions 
have been made, but in both cases, attribute names have been taken directly 
from the specification as listed in the appendix. 

OPINS stores similar common personnel information to that in the 
ADMI database, such as name, rank, sex, etc.. The data reflects important dif- 
ferences in the OPINS area of interest, however. It contains relatively detailed 
data about an officer's educational history, both civilian and military, as well 
as the military qualifications resulting from that training. The officer's pro- 
motion status £md history is captured very explicitly, including year group, 
precedence number, and the dates of accession to each rank. The unit 
assignment data in OPINS differs from the brief essentials kept by the ADMI 
database in being far more extensive. Historical assignments, by billet num- 
ber, primary and collateral duty, dates assigned, and projected rotation date 
for the current assignment are maintained for each officer. 
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3. Inactive Manpower and Personnel Management 

Information System (IMAPMIS) 

The IMAPMIS database is maintained by the Naval Reserve Force as 
an integrated repository of information on all members of the Naval Reserve. 
This includes both officer and enlisted reserve personnel, as well as active 
duty Naval personnel in the Treiining and Administration of Reserves (TAR) 
field. It is a relational database and is the most recently implemented of the 
three. Partied table definition for IMAPMIS is presented in Appendix C. Fewer 
assumptions at the attribute level were reqiaired in analyzing IMAPMIS, as the 
available definition is far more complete than for ADMI or OPINS. 

Like the ADMI database, and the OPINS, the IMAPMIS maintains the 
essential administrative data needed by the personnel function (neime, rank, 
pay entry base date, etc.). It also stores a wide variety of unique information 
specific to the Naval Reserve manpower management process. This indudes 
reserve unit affiliation, in addition to mobifization unit assignment, last paid 
drill, total credited drills, whether drills were voluntary or mandatory, and 
retirement points acciimulated. The training data captured by the IMAPMIS is 
also the most extensive of the three systems, including the information avail- 
able in both the OPINS and the ADMI databases, as well as data elements indi- 
cating reserve officer training accomplished by enlisted members, reserve 
mobilization training evolutions, and service experience in mifitary opera- 
tions. As was seen in the variation of informationed content between the ADMI 
and OPINS databases, the spedfic facts recorded in the IMAPMIS reflect the 
different area of interest of its users. 

In all cases, the assigned definitions are intended to be realistic, 
and consistent with the design of the database in question. The assumed 
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definitions should not be taken as representative of any actual data definition 
in use for the given database, and are only presented for the purpose of 
illustration. 

B. SOURCES OF HETEROGENEITY 

1. Database Management System / Platform 

It is obvious that the three (ADMI, OPINS, and IMAPMIS) have differ- 
ent implementation details. While ADMI and OPINS may in fact run under 
identical DBMSs, hardware, and operating system, IMAPMIS certainly runs 
under an incompatible DBMS, and has a different hardware/operating system 
combination. Any heterogeneity this situation may or may not introduce to 
the multidatabase scenario tmder discussion is not germane to this analysis. 
The focus of this analysis is the effort to identify and resolve the semantic 
conflicts which are present. 

2. Semantic 

Since the three databases under discussion were all developed and 
implemented at different times, by different organizations, for different pur- 
poses, it should not be surprising that very different conceptual 
arrangements have resulted. A review of the Defense Manpower Data Center 
ADMI database reveals a very different area of interest, for instance, than 
that of OPINS. The Defense Manpower Data Center is concerned with issues 
such as total military end-strength, allotment of personnel resources to 
budgetary program elements, and the like. OPINS, on the other hand, being a 
service-specific database, captures a very different set of data for a given 
officer, including present and past assignments by billet, and promotion year 
group. There is a large overlap in the area of basic information (name, rank. 
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SSN, etc.), but it is obvious that the designers of OPINS were interested in a 
different view of the commissioned officer than that presented by ADMI. 
IMAPMIS data overlaps both ADMI and OPINS, and additionally captures 
information of specific interest to the personnel management of the Reserve 
force, such as Reserve unit affiliation, and last credited drill period. 

Besides varying areas of informational interest, the three database 
design efforts employ very different naming conventions. ADMI largely 
employs plain language labels for data elements which are easily imderstood. 
OPINS uses much more service-specific language, which would be obvious to 
someone familiar with Navy terminology, but perhaps confusing to a layman. 
IMAPMIS follows the OPINS terminology closely, but since it is described in a 
particular DBMS language, the entity and attribute names are awkward and 
not always easily matched to their corresponding elements in ADMI and 
OPINS. This results in a great deal of semantic heterogeneity, since it becomes 
an important issue to resolve whether each designer means the same thing 
when an attribute is called UNIT, for instance. 

C. ATTEMPTING TO QUERY THE TOTAL BODY OF DATA 

Information on the population of interest. Naval Commissioned Officers, 
is contained across all three databases. Frequently, queries that span the 
three databases need to be answered. For example, we may like to retrieve all 
available data for a given value of a key attribute, such as Social Security 
Number. Obviously, a query against any one of the databases cannot ensure 
this. Information on all officers may not exist in a single database. For 
instance, an active duty officer not in the TAR program will not appear in 
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IMAPMIS. As was pointed out, different attributes representing real world 
items of interest are contained in different databases. 

To guarantee no loss of any information already available, we must 
somehow present a global query which will be processed against a global 
schema that represents the integration of the three databases, and return the 
requested information. Even when this is accomplished, the fiuther problem 
of conflicting data remains. In other words, due to differences in update 
times, data entry errors, etc., even identical attributes for the same officer 
may contain different data values. 

Therefore, because of the different data organizations, naming conven- 
tions, and particular information available in each database, as well as the 
situation where conflicting data represents the same information, there must 
be some means of resolving the inevitable semantic conflicts which will arise 
when particular attributes are returned. 

D. INTEGRATION STRATEGY 

To allow queries that span several databases, a federated database 
approach is suggested. Following this approach, each local database is con- 
sidered a logical component in the federation. These components are tied 
together by a global schema that represents the integration of the local 
schemas. To accomplish this several steps are necessary. First, each local 
schema is transformed into an equivalent schema in a semantically rich 
common data model. This step is carried out in the following sections using 
the Extended Entity Relation (EER) model, applying the concepts and dia- 
grammatic conventions covered in Chapter III. Second, a systematic com- 
parison is made across the individual equivalent schemas between 
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corresponding entities, and attributes, searching for potential conflicts. Third, 
after resolving semantic conflicts, the local schemas in the common data 
models are merged to form a global schema. Fourth, an additional control 
component, known as the global controller, is required. The global controller 
maintains the definition of the global schema and acts as a coordinator and 
translator: it receives a global query, possibly in a user specific language; 
translates it into an equivalent query on a common-model global schema; 
decomposes and translates the common-model query into subqueries to the 
corresponding local database sites for processing; collects the results; identi- 
fies and resolves data content conflicts; reformats the result; and sends it 
back to the originating site. The first three steps of this process are covered in 
detail in the remainder of this thesis. The theoretical design of the query and 
resolution components of the global controller described in step fom:, above, is 
related to the levels of schematic and data heterogeneity covered by this 
analysis. Chapter VI will show how the methods of semantic conflict resolu- 
tion developed can be applied to the design of the global controller. The spe- 
cific implementation of the global controller deals largely with the levels of 
DBMS and platform heterogeneity mentioned earlier, and is outside the scope 
of this study. 

Due to the large number of attributes comprising the real world sample, 
this analysis extracts a representative subset of attributes from each 
database. This subset adequately illustrates the methodology employed. 
Similar treatment of the complete ADMI, OPINS, and IMAPMIS schemas would 
follow identical procedures. 

The remainder of this chapter deals with transforming the ADMI, OPINS 
and IMAPMIS schemas into equivalent EER schemas. Chapter V uses these 
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diagrams to identify a comprehensive set of potential semantic conflicts 
among equivalent EER schemas using examples from the three databases. 
Chapter VI employs this classification framework to suggest potential solu- 
tions for each type of conflict and complete the realization of a comprehensive 
global schema. 

1. Translation of ADMI Into EER Form 

Deriving an EER diagram from the ADMI database was begun by 
selecting an appropriate subset of attributes from the total which comprise 
each ADMI record. The specific attributes were chosen to ensure that similar 
information was analyzed from each database, as well as to realistically show 
the differences in domains of interest. Once the set of data elements was 
determined, they were grouped as attributes of a logical arrangement of real 
world entities. These entities were then related based on a reasonable 
interpretation of the conceptual view which ADMI is attempting to represent. 

Since ADMI is a flat file, all data elements it contains can in some 
sense be considered simple attributes of a single entity. However, certain 
analytical standards are applicable. The repeating set of fields used to repre- 
sent LANGUAGE, for instance, clearly represents a multi-valued composite 
attribute which is appropriately diagrammed as a separate entity. Since the 
ADMI database contains information on all active duty personnel, fields which 
take on different values depending on officer/enlisted status, and specific ser- 
vice membership, can be diagrammed as defining attributes of 
Generalization/ Specialization relationships. This is how the relationship of 
active duty member to service member to specific service officer is modeled. 
The shaded entities for other service member, and naval enlisted, are 
included in the diagram only to indicate the structiure of the relationship, and 
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are not populated with the describing attributes they would possess in a 
complete representation. In the actual implementation of ADMI, there would 
not be a separate instance of the UNIT entity, since it is merely a set of 
attributes of the member record. In reverse engineering from a flat file 
database to an EER, however, it is proper to represent UNIT as an entity, 
having existence independent of its relation to a particular member. In this 
way, the most general level of conceptualization is achieved. This is analo- 
gous to the convention which would be followed in modeling the real world 
top down to an EER schema. The particular relation of unit and member in 
the actual ADMI is only an artifact of a given implementation decision. 

Obviously, some of the results of the flat file to EER translation 
shown below are based on arbitrary assumptions, and may be open to chal- 
lenge. The process detailed here is representative of what would be done in a 
more rigorous manner if, for instance, the multidatabase designer had access 
to information on the intentions of the designers of the original database. At 
the conceptual level of this treatment, the effort is to illustrate the procedime, 
and ensure that all the various potential conflicts are enumerated. While 
detailed translation of the ADMI might result in a slightly different EER dia- 
gram, it is not felt that any undue artificiality has been introduced into the 
example. 

The entity structure extracted from the ADMI database is presented 
in Appendix D. The completed EER diagram of the extracted attribute subset 
is shown in Figure 14. 
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Figure 14. EER Schema for the Active Duty Military Inventory 
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2. Translation of OPINS Into EER Form 



An identical translation process was performed on the OPINS flat 
file database. The extracted subset in this case resulted in a substantially dif- 
ferent EER diagram, though very similar attributes were utilized. This points 
out the semantic differences which arise in each designer's representation of 
the real world. The entity UNIT, for instance, is derived from a repeating com- 
posite attribute in the OPINS record, and is diagrammed as a separate entity 
having a one-to-many relationship with COMMISSIONED_OFFICER. This is dif- 
ferent from the relationship between MEMBER and UNIT in the ADMI exeunple, 
because OPINS actually captures a historical record of unit assignments, vice 
simply the current one. Likewise, the entity YEAR_GROUP has no matching 
construct in ADMI, since this represents information of interest solely to the 
designers of OPINS. 

Similar caveat is offered regarding the exact process of translation 
for OPINS as was true for ADMI. No claim is made for the fidelity of the EER 
diagram as translated, relative to the actueJ real world view intended by the 
OPINS designers. However the results given here are representative of the use 
of the EER process and model to formulate a bottom up conceptual schema 
from an existing database. 

The entity structure extracted from the OPINS database is pre- 
sented in Appendix E. The completed EER diagram for OPINS attribute subset 
is shown as Figure 15. 
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Figure 15. EER Schema for the Officer Personnel Information System 
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3. Translation of IMAPMIS Into EER Form 



Unlike the potentially arbitrary assumptions required in translat- 
ing the ADMI and OPINS flat files to EER form, the conversion of IMAPMIS is 
more straightforward. Since IMAPMIS is a relational database, in most cases 
there is a simple correspondence between the IMAPMIS tables as defined, and 
the entities modeled. Some entities, such as LANG, are not specified uniquely 
as separate tables by the IMAPMIS specifications, though they are referred to 
as individual record types. Relationships for the IMAPMIS EER diagram are 
easily derived from the location of foreign keys within the tables. 

The entity structure extracted from the IMAPMIS database is pre- 
sented in Appendix F. The EER diagram for the IMAPMIS subset as translated 
is shown as Figure 16. The shaded entity for enlisted member is included as a 
place holder only to indicate the structure of the relationship, and is not 
populated with the describing attributes it would possess in a complete 
representation. 
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Figure 16. EER Schema for the Inactive Manpower and Personnel 
Management Information System 
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V. A FRAMEWORK FOR SEMANTIC HETEROGENEITY 



A. CLASSIFYING SEMANTIC CONFLICTS 

With the candidate databases transformed into equivalent EER schemas, 
potential semantic conflicts can be identified. To facilitate the classification 
and resolution of semantic conflicts, a framework for identifying such con- 
flicts is developed in this chapter. The classification framework presented 
here recognizes two broad kinds of conflict. Schematic conflict, which occurs 
at the level of the conceptued organization and definition of the database, and 
data level conflicts, which occur between the actual data values returned 
from the different databases by a query against the global schema. 

Procedurally, the individual EER schemas are matched against each 
other in a top-down fashion, and conflicts as they are noted are assigned to 
sub-categories of the schematic division. When all possible schematic conflicts 
have been classified, a more speculative analysis of possible data-level con- 
flicts is conducted, to determine potential problems. The remainder of this 
chapter presents the classification freimework using examples resulting from 
the analysis of the ADMI, OPINS, and IMAPMIS databases. References to the 
assumed detailed data definitions, which are provided for extracted attribute 
subsets in Appendices G through I for ADMI, OPINS and IMAPMIS, respectively, 
are intended to be complete enough so that immediate cross checking is not 
required. Analysis of the appendices will reveal many potential conflicts not 
explicitly shown below. Chapter VI offers some potential resolution strategies 
for resolving the semantic conflicts between the candidate databases, and 
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completes the integration of a global EER schema which would be used to 
guide the formulation of queries against the complete body of data. 

1. Schematic Level Conflicts 

As mentioned above, this type of conflict arises from the conceptual 
arrangement and definition of the databases. Since all three databases have 
been represented in an equivalent EER form, the process of identifying these 
disparities is simplified. Top-down analysis of the individual database 
schemas yields three subcategories of schema level conflict: entity conflicts; 
attribute conflicts; and entity- attribute conflicts. Entity level conflicts occur 
between equivalent entities. Attribute level conflicts specify discrepancies 
among like attributes. Entity-Attribute level conflicts concern differing orga- 
nization of data, such as representing the same information as an attribute in 
one case, and as an entity in another. 

Each subcategory will be detailed in order, with examples from the 
three databases imder discussion. 

a. Entity Level Conflicts 

Entity level conflicts occur when like real world entities have 
differing names (synonyms), or differing entities have identical names 
(homonyms). Entity structures as represented by the database schema may 
also conflict. A third entity level conflict occurs when relationship constraints 
between entities differs across two or more schemas. 

(1) Naming Conflicts. An example of a synon)nn problem is 
the entity COMMISSIONED.OFFICER, in the OPINS database, contrasted with 
the equivalent MEMBER-OFR in IMAPMIS. Both refer to instances of a particu- 
lar commissioned Naval officer, but in an integrated schema, a single entity 
name must be specified. Similarly, IMAPMIS names a given coiirse of college 
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education for a given officer EDUC, but OPINS names the same entity 
EDUCATION. Relatively obvious dissimilarities such as this are simple to 
resolve, but all kinds of complex synonym conflicts can occur in real world 
cases. 

Homonyms are a more serious problem, because in this 
case, different real world entities are given the same name. Identification of 
homonym conflicts requires more detailed dissection of each entity, to deter- 
mine its actual meaning. When completely differing concepts are captured by 
like-named entities, this must be rigorously checked, since an uncritical map- 
ping of the two into a single entity in the global schema will give a meaning- 
less result. An example of homonyms is apparent when the UNIT entity from 
ADMI is compared with UNIT in OPINS. In the ADMI database, UNIT refers to 
an instance of a military activity, such as a ship or squadron. In OPINS, how- 
ever the same name is given to an entity which is actually an officer's 
assignment to a given billet, at a given unit. It is obvious that even though 
identical names are assigned to these entities, a very different semantic con- 
tent is represented in the two. 

(2) Entity Structure Conflicts. This is caused by overlapping 
or incomplete attribute sets for equivalent entities. This can arise due to fail- 
ure of one database to include certain attributes captured by another because 
it was not considered of interest. Information concerning an entity might also 
be represented by the attributes of other entities in a Generalization/Speciali- 
zation relationship. 

An example of missing attributes is found in UNIT which 
in the OPINS database does not include an attribute for the unit Zip code, 
while UNIT in the ADMI does. The designers of the OPINS did not choose to 
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store this particular information. Likewise, LANGUAGE in the ADMI has an 
attribute for listening proficiency, while LANGUAGE_SKILL in OPINS contains 
an attribute for writing skill. Again, this results from differing areas of 
informational concern when the original databases were designed. 

Overlapping attributes are found in MEMBER-OFR 
(IMAPMIS), which does not contain the member's name, contrasted with 
COMMISSIONED.OFFICER (OPINS), which does. This is due to the General- 
ization/Specialization relationship of MEMBER-OFR to MEMBER in IMAPMIS. 
The member's name is represented by an attribute of MEMBER, Thus the 
same information is present, but at a different level of the schema. Since 
OPINS captures information on a more limited population than IMAPMIS, the 
attributes are arranged in a different manner. 

(3) Constraint Conflicts. When the cardinality of relationship 
between two entities varies across two or more schemas, it is termed an 
entity constraint conflict. This is shown by the n-to-1 relation between UNIT 
and ACTIVE_DUTY_MILITARY_MEMBER in the ADMI, as opposed to the 1-to-n 
relation between COMMISSIONED.OFFICER and UNIT in the OPINS. If the 
structure of the entities manifesting a constraint conflict is indeed similar, 
this again indicates a basic semantic conflict regarding just what the 
databases are attempting to represent. It will be shown that in this particular 
instance, the constraint conflict actually results from a structure conflict 
because the two unit entities are dissimilar. However, constraint conflicts are 
independently a valid classification of semantic heterogeneity. 

Another type of entity constraint conflict occurs when 
there is a difference in participation requirements for equivalent relation- 
ships in two databases. An example of this is given by the partial 
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participation of MEMBER in the 'Has' relationship with SCRTY-CLEAR in the 
IMAPMIS. Contrast this with the total participation of COMMISSIONED_ 
OFFICER in the 'Certified For' relationship with SECURITY.REQUIREMENT in 
the OPINS. This type of conflict arises from differing views of the informa- 
tional domain by two groups of users. Since IMAPMIS defines the relation- 
ship of a generalization member to a security clearance (e.g., all members 
may have a security clearance), the participation constraint conflicts with 
that of OPINS, which models total participation (all officers must be certified 
for one and only one security requirement). 
h. Attribute Level Conflicts 

Attribute level conflicts cover the same conceptual range as the 
entity level. Attributes representing the same real world informational ele- 
ment can have differing names, or differing attributes identical names. 
Attribute structure conflict is analogous to entity structure conflict. Attribute 
constraint conflict differs from entity constraints since it is due not to rela- 
tionship cardinality or participation constraints, but to differences in the 
attribute definition. 

(1) Attribute Name Conflicts. Like entity name conflicts, this 
category comprises synonyms and homonyms. The reasons for this type of 
conflict are the same as for the entity level. Samples of attribute S3monyms 
from the databases of interest are DESIGNATOR (OPINS) and DESIG 
(IMAPMIS), as well as ORIGINAL_SOURCE_CODE (OPINS) and SOURCE_OF_ 
COMMISSION (ADMI). These both illustrate identical real world facts called by 
different names. 

An example of homonyms is UNIT_IDENTIFICATlON_CODE 
(ADMI) contrasted to UNIT_IDENTIFICATION_CODE (OPINS). These two 
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identically named attributes represent different real world facts. The ADMI 
captures Department of Defense wide unit identification, while the same 
attribute in the OP INS is actually a composite attribute made up of 
PARENT_UIC and ACTUAL_UIC with the latter attribute corresponding to 
UNIT_IDENTIFlCATION_CODE. 

(2) Attribute Structure Conflicts. These are similar to entity 
structure conflicts, and arise from information being represented by an 
atomic attribute in one database, and the same information as either two 
separate attributes, or part(s) of a composite attribute in another. 

Equivalent information is captured by RACE_ETHNIC in 
ADMI, and the two attributes RACE and ETHNIC in the OPINS. This case 
demonstrates a single attribute to multiple attribute structure conflict. Alter- 
natively, the real world value of an officer's warfare designator is represented 
by the atomic attribute DESIG in the OPINS, while the IMAPMIS database 
breaks this information down into DESIG-CAT and DESIG-STAT, which 
themselves are part of the composite attribute DESIG. 

(3) Attribute Constraint Conflicts. Unlike constraint conflict 
at the entity level, attribute constraint conflict occurs due to the detailed 
description of the attribute itself. Thus equivalent real world facts are repre- 
sented by attributes which have different data definitions. This can be mani- 
fested as type clashes (e.g., character opposed to numeric), length clashes 
(e.g., larger or smaller number of characters in a given field), and range 
clashes (e.g., different allowable set of values for equivalent facts). Type con- 
flict is quite common when dealing with databases designed for different 
operational implementations, while range and length conflict results more 
from semantic design choices. 
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An example of type and length clash is given by SOCIAL. 
SECURITY.NUMBER (ADMI), which is defined as a 4 byte packed integer, 
while the identical information is defined as a nine numeric integers (which 
can be handled as a string by modem processing techniques) for SSN (OPINS). 
The two Year/Month/Day attributes DATE.OF.BIRTH (ADMI) and DOB 
(IMAPMIS) are similarly mismatched, as the first is stored as a 3 byte packed 
integer, and the second as a 6 character string. 

Allowable value, or range, clash, is also illustrated by the 
two date attributes just noted. In the IMAPMIS, the member’s date of birth is 
defined as having a value between January 1, 1900 and December 31, 1999. 
An incompatible range is defined for the ADMI, since the date of birth in this 
database can take on any 6 digit value which corresponds to a valid date (in 
other words, the date is only constrained to be a date, and could represent a 
value outside that allowed for the same date in the IMAPMIS). 
c. Entity Attribute Conflicts 

Entity attribute conflicts arise when equivalent information is 
represented as an attribute of one entity in a given database, but as a sepa- 
rate entity in another database. This situation arises, like other structural 
semantic conflicts, because of conceptual design choices concerning the 
desired organization of information. A particular data element might be con- 
sidered to be part of the aggregate data defining an entity by one design 
team, but the same element(s) might be considered important enough to set 
aside as an independent entity by another team. As in other structured 
conflicts, entity attribute conflicts have the effect of placing corresponding 
information at different levels of the schema. 
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An example of this is the member's security clearance informa- 
tion, which in the OPINS and IMAPMIS databases is represented as separate 
entities; SCRTY-CLEAR in the IMAPMIS, and SECURITY_CLEARANCE in the 
OPINS. The equivalent real world information (though less detailed) is stored 
by the ADMI as the composite attribute SECURITY_lNVESTIGATION, and the 
atomic attribute SECURITY_CLASSIFICATION, both belonging to the ACTIVE. 
DUTY_MILITARY_MEMBER entity. Different views of the real world bring 
about these differing conceptual arrangements of the same information. 

d. Completed Schematic Level Classification Framework 
It should be apparent from the examples give above, that 
multiple simultaneous conflicts can exist at any level. Entities which have 
synonym conflicts can at the same time have structural and constraint mis- 
matches. Equivalent attributes are often subject to both name, and structure/ 
constraint conflicts. The value of the classification framework presented here 
is that it provides a systematic analytical tool for the identification of all 
schematic conflicts. 

The full schematic classification portion of the framework is 
reiterated in Figure 17. 

2. Data Level Conflicts 

The full enumeration of semantic conflicts must also accoimt for 
data level conflicts, even when all possible schematic conflicts have been 
identified and resolved. This is because even identically defined and named 
attributes may contain actual data values which do not agree. Data level 
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1. Entity Level Conflicts 
Naming Conflicts 

Synonyms (Same real world entities have same name in different dBs.) 
Homonyms (Different real world entities have same name in different dBs.) 

Structure Conflicts 

Different attribute sets 
Missing attributes 
Overlapping attributes 

Relationship Constraint conflicts 

2. Attribute Level Conflicts 
Naming Conflicts 

Synonyms (Same real world entities have same name in different dBs.) 
Homonyms (Different real world entities have same name in different dBs.) 

Constraint Conflicts 

Type clash. (Equivalent real world attributes have different data type 
definitions in different dBs.) 

Ranee clash. (Equivalent real world attributes of the same type data 
have different allowable range definitions in different dBs.) 

Structure Conflicts 

(Equivalent real world information is represented as a single attribute 
in one dB, and as either two separate, or part(s) of a composite 
attribute in another.) 

3. Entity Attribute Level Conflicts 

(Equivalent information is represented as an attribute of an entity in 
one dB, and as either a separate entity, or attribute(s) of a Generalization/ 
Specialization entity structure in another.) 



Figure 17. Framework of Conceptual Schema Level Heterogeneity 



conflict can be broken down into two main t}T)es; inconsistencies, and repre- 
sentation conflicts. Inconsistencies refer to the case where two equivalent 
values for an identical instance, such as a date, or rank, do not agree when 
the results of a query are returned from two or more databases. Data 
representation conflicts cover a much more diverse spectrum of possible 
conflicts, arising from dissimilar expressions, dissimilar units, and dissimilar 
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precisions. Incorporating these potential data conflicts into the classification 
framework completes this chapter, and results in a valuable methodological 
tool for complete identification of semantic heterogeneity. 
a. Inconsistencies 

Inconsistencies are easily conceptualized, and unfortunately 
very common, semantic conflicts. They arise from the real world process of 
creating, updating and maintaining databases. Different update times, 
human data-entry errors, or incorrect data submitted to be stored can all 
produce inconsistency. An inconsistency results when one database returns a 
given value for a specific real world element of interest, and another database 
returns a different value for the same element. This conflict is independent of 
any schematic naming or other conflict. While simply imderstood, and easily 
identified, inconsistency is the most difficult conflict to resolve. Often there is 
simply no other method available to reconcile an inconsistency except to go 
back to the original source of the data value, and determine which (if any) of 
the conflicting values are correct. There are other potential ways to approach 
the resolution problem, which will be addressed in Chapter VI, but none 
which are guaranteed to provide a general solution. 

A simple, and obvious, example of an inconsistency is the ADMI 
database returning a PAY_GRADE of 4, corresponding to 04, or Lieutenant 
Commander, for a given commissioned officer, while the OPINS returns a 
value of 3 for the attribute PRESENT_GRADE, indicating a rank of Lieutenant. 
One of the two is incorrect, since an officer only holds one rank in the real 
world. Techniques for determining which value to use will be presented in 
Chapter VI. 
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h. Data Representation Conflicts 

Data representation conflicts occur when incompatible sym- 
bols, units of measurement, or degrees of precision are used to store equiva- 
lent data elements. In general, this is due to design choices at the conceptual 
level caused by differing areas of interest, or levels of concern, about given 
real world information on the part of the database designers. One organiza- 
tion may wish to have very specific and precise information about an 
attribute of interest, while another organization might be satisfied with a 
general categorization of the same data. Alternatively, one design team may 
be accustomed to dealing with coded references to external look up tables to 
represent values, while another set of designers prefer to more explicitly 
represent values with characters. The physical implementation details of the 
hardware in use, and the individual processing procedures of the DBMS also 
influence the occurrence of data representation conflicts. 

(1) Dissimilar Expressions. Dissimilar expression conflicts 
come about when two or more databases use the same t 3 q>e of data, but the 
values stored in the attribute have different meanings. For example, equiva- 
lent information might be represented by different ch 2 U*acter strings. An 
instance of this is ACTIVITY_TITLE, a character attribute which in OPINS 
represents the UNIX's text name, such as 'COMSURFRON THREE'. Contrast this 
to the attribute ACTY-LANG-NAME, also a character attribute, which IMAPMIS 
uses for the same information. The actual string stored in this attribute for 
the equivalent unit might be 'CMDR, SFC SQDRN 3'. Thus given character 
strings returned from the two databases, may or may not have the same 
meaning. 
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(2) Dissimilar Units. Dissimilar unit conflicts are caused by 
the storage of information, particularly absolute or relative measurements, in 
attributes with the same type, and length, and range, but with allowable 
values defined in different units. In the analysis of the administrative ADMI, 
IMAPMIS, and OPINS personnel databases, examples of this particular type of 
conflict are rare, since few measurements are maintained. One illustration is 
the UNIV-DUR attribute, in the IMAPMIS database, which represents a 2 char- 
acter value for the length of an officer’s course of instruction in weeks. This 
choice of units comes about through a domain analysis which indicates that 
the population of interest (Naval Reserve commissioned officer personnel) are 
likely to take shorter courses as opposed to longer courses pursued by active 
duty personnel. On the other hand, the DURATION attribute in the OPINS is 
also two characters (although stored as numeric integers), but represents the 
length of a course of instruction in months. If an attempt is made to match 
these two values, a dissimilar units conflict will occur. The value 20, returned 
from both, wovdd mean both 20 weeks, and 20 months, respectively. 

(3) Dissimilar Precisions. This type of data level conflict is 
due to real world information being specified at the attribute level in different 
degrees of precision. In other words, the same value returned from two or 
more databases has a different meaning because an identical range is subdi- 
vided with different levels of granularity. Consider READING_PROFICIENCY 
from the ADMI database. This 1 character attribute is constrained to the 
numeral values of zero through nine, with nine being defined as fluent, and 
zero as unacceptable, with eight gradations completing the allowable values. 
This provides the DMDC very precise information on the foreign language 
reading ability of personnel in the database. The OPINS definition for 
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SKILL_READ, however, while it is also a 1 character attribute, groups the 
allowable ten numeral range into four sub ranges, from poor, to outstanding. 
Obviously, although the two attributes store equivalent information in identi- 
cal formats, the values from OPINS cannot be considered to give an identical 
level of detail as those from ADMI, since within sub ranges any value will re- 
sult in one of the four broad categories being returned as a result of a query, 
c. Completed Data Level Classification Framework 

The proceeding data level conflicts will not all become 
apparent in the process of integrating a multidatabase from a set of 
heterogeneous databases. Dissimilar expressions and dissimilar precisions 
may or may not be identified, depending on the depth of description available 
to the integration effort in terms of detailed data definitions. The actual 
attribute definitions for the three candidate databases were not considered in 
this study, and the assigned data definitions have been designed to illustrate 
each of the possible conflict types. This is representative of the level of 
analysis required to identify the full range of semantic conflicts. 

Unfortunately, data inconsistencies will almost certainly not 
become obvious, until data from global queries is returned. No level of purely 
conceptual analysis will be able to preclude wrong data, mismatched update 
times, or data entry error. Inconsistencies are included in the framework 
because they represent one very important type of semantic conflict, albeit 
one not resolvable by the conceptual integration effort. 

The complete data level classification portion of the framework 
is reiterated in Figure 18. 
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1. Inconsistencies 

(Equivalent information returned from different dBs disagrees in value.) 

2. Data representation conflicts 

Dissimilar Expressions 

(Equivalent information returned from different dBs is represented by 
incompatible values.) 

Dissimilar Units 

(Equivalent information returned from different dBs is expressed in 
different units.) 

Dissimilar Precisions 

(Equivalent information returned from different dBs is given to different 
degrees of accuracy.) 



Figure 18. Framework of Data Level Heterogeneity 



B. THE SEMANTIC HETEROGENEITY FRAMEWORK 

The fully realized framework for classifying semantic heterogeneity can 
now be applied to any set of existing databases which have been transformed 
into equivalent EER schemas. By using a systematic approach to analysis of 
each equivalent set of entities, attributes, and relations, all possible semantic 
conflicts will be identified. Of course, for the useful integration of a set of 
heterogeneous databases into a global schema, these conflicts must somehow 
be resolved. Chapter VI addresses this issue in a general way, offering some 
possible solutions for each category of semantic conflict. Applying these 
methods of resolution, the three administrative databases under con- 
sideration will be integrated into a coherent, globally addressable EER 
schema. The specific instrumentalities of resolving each t)q)e of conflict, as 
well as a rigorous analysis of general solutions, is left to future research. 

The complete framework for semantic heterogeneity is shown below as 
Figure 19. 
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Schematic Level 



1. Entity level conflicts 

Naming conflicts 

Synonyms (Same real world entity has different names in different dBs.) 
Homonyms (Different real world entities have same name in different dBs.) 

Structure conflicts 

Different attribute sets 
Missing attributes 
Overlapping attributes 

Relationship Constraint conflicts 



2. Attribute Level Conflicts 

Naming conflicts 

Synonyms (Same real world attribute has different names in different dBs.) 
Homonyms (Different real world attributes have same name in different dBs.) 

Constraint conflicts 

Type clash. (Equivalent real world attributes have different data type definitions 
in different dBs.) 

Range clash. (Equivalent real world attributes of the same data type have different 
allowable range definitions in different dBs.) 

Structure conflicts 

(Equivalent real world information is represented as a single attribute 
in one dB, and as either two separate, or part(s) of a composite 
attribute in another.) 



3. Entity Attribute Level Conflicts 

(Equivalent information is represented as an attribute of an entity in 
one dB, and as either a separate entity, or attribute(s) of a 
Generalization/Specialization entity structure in another.) 



Figure 19. Complete framework for Semantic Heterogeneity 
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Data Level 



1. Inconsistencies 

(Equivalent information returned from different dBs disagrees in value.) 

2. Data representation conflicts 

Dissimilar Expressions 

(Equivalent information returned from different dBs is represented by 
incompatible values.) 

Dissimilar Units 

(Equivalent information returned from different dBs is expressed in 
different units.) 

Dissimilar Precisions 

(Equivalent information returned from different dBs is given to different 
degrees of accuracy.) 



Figure 19. Complete framework for Semantic Heterogeneity (Concluded) 
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VI. SOLUTIONS FOR RESOLVING SEMANTIC HETEROGENEITY 



A. GENERAL APPROACH 

With the candidate databases for integration in equivalent schemas, and 
all potential sources of semantic conflict identified using the framework pre- 
sented in the preceding chapter the final step can be completed. This is to 
consolidate them into a single global schema which can be used to guide the 
formulation of queries against the total set of available data. Additionally, 
internal design considerations of the global controller component which actu- 
ally manipulates the federated database are developed during this stage of 
the integration process. It is during this phase that conflicts are resolved, 
while not losing any information. 

The spectrum of possible solutions to identified semantic conflicts ranges 
from complete redesign of a new integrated database, to maintaining the sep- 
arate databases, under some query scheme which allows them to be 
addressed as one. This federated database approach was described in Chapter 
IV, and this chapter presents in the federated database context some very 
general rules which can be used to resolve the conflicts noted in Chapter V. 
These rules apply both at the level of schema integration and data conflict 
resolution. The question of verifiably correct solutions to the various types of 
semantic conflict is a rich field of future research on integrating heteroge- 
neous databases. 

The resolution strategy presented here proceeds in two parts, forming 
the globed schema, and dealing with data conflicts which are returned against 
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queries. First, schema conflicts between the local schemas in the common 
data models are resolved, allowing them to be merged to form a global 
schema. This guides the user in formulating queries against the total body of 
data. Suggested methods for choosing the structure of the global schema are 
offered below for each type of conflict. Design of the global controller compo- 
nent is guided by these choices in the processing of queries. Second, the global 
controller is provided with the complete definition of the global schema, 
including appropriate means of mapping from the global schema to the com- 
ponent databases, as well as the information needed to translate, compare, 
and resolve the various data conflicts which will arise when data is returned 
from a global query. 

The design of the global controller is influenced by the understanding of 
semantic conflicts gained during the re-engineering process. This component 
deals with semantic conflicts during query processing and retrieval, as well 
as resolving data level conflicts which occur when inconsistent data is 
returned for the same real world item of interest by the component 
databases. During querjdng and retrieval, the controller must know how to 
map from the entity and attribute names chosen for the global schema back 
to the actual names used in the component databases. When data is returned, 
the controller must have means to translate various attribute definitions into 
a common form, compare their values, and if possible, resolve data level con- 
flicts before presenting the information to the user. In both these aspects, the 
re-engineering analyst uses detailed knowledge of the semantic conflicts 
existing among the component databases gained through the process 
described in this thesis. 
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The basic assumption of this chapter is that all available information is 
to be captured in the global schema. In other words, no attributes from emy 
database are to be excluded if they provide data not represented elsewhere by 
equivalent attributes. Where data is duplicated, the rules presented below 
guide the choice of alternatives for inclusion in the global schema. Thus the 
union of attribute sets from equivalent entities is most often suggested, which 
ensures that missing attributes from any one database are not lost. 
Heuristics that identify which of several redundant overlapping attributes 
may be safely disregarded complete this part of the resolution process. 

Another underl3dng assumption is that data included in the global 
schema shoiild be represented in the highest level of definition or precision 
available. Therefore when several attributes capture equivalent information, 
the most precisely defined, or that which specifies the highest available 
degree of precision is chosen over redundant alternatives. 

A final general comment on resolving semantic conflicts is that in many 
cases, there will simply be no other choice them to go back to the user. This is 
particularly true in the case of data inconsistencies as will be noted below. 
Re-examination of the real world data set might also be required to resolve 
cases of wrong data, though there are rules of thumb which can be applied 
with some risk of error. 

The following section restates the specific semantic conflicts, by type, 
which were used as examples in the preceding chapter. Proposed solutions for 
each type of conflict are offered, with estimates of effectiveness, practicality, 
and certainty of correct resolution where appropriate. The completed global 
schema for the three officer personnel databases is presented at the end of 
the section on schematic conflict. This is followed by a section dealing with 
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data level conflicts, with some considerations for the design of a global con- 
troller component for a federated database application approach to integrat- 
ing them. 

B. PROPOSED SOLUTIONS 

The following examples duplicate, for consistency, the conflicts by type 
which were identified and classified in Chapter V. 

1. Schematic Level Conflicts 

Solutions to schematic level conflicts generally involves renaming, 
combining, or redefining entities and attributes in a practical way to ensure 
the preservation of all originally available semantic content. The global con- 
troller uses name mapping and look-up tables to allow decomposition of 
queries against the entity or attribute name chosen for the global schema 
back to the component databases. With the possible exception of constraint 
conflicts, the integrating designer having a clear understanding of the prob- 
lem domain does not need frequent recourse to the user in resolving this level 
of conflict. 

a. Entity Level Conflicts 

Naming, structure, and constraint conflicts amongst equiva- 
lent entities is resolved by suitably renaming, and combining attribute sets to 
form consolidated global schema entities. Suitable look-up tables are included 
for the global data definition in the global controller to map between these 
global schema names and the existing names utilized at the component 
database level. Analyzing the original semantic intention of the users might 
be required to resolve some entity constraint conflicts. 
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(1) Naming Conflicts. An example of a synonym problem is 
the entity COMMISSIONED_OFFICER, in the OPINS database, contrasted with 
the equivalent MEMBER-OFR in IMAPMIS. Similarly, IMAPMIS names a given 
course of college education for a given officer EDUC, but OPINS names the 
same entity EDUCATION. For obviously equivalent entities such as these, the 
more fully detailed name should be chosen. Alternatively, a name from a 
standardized data definition which appropriately describes the global entity 
could be chosen. 

An example of homonyms is apparent when the UNIT 
entity from ADMI is compared with UNIT in OPINS. In the ADMI database, 
UNIT refers to an instance of a military activity, such as a ship or squadron. 
In OPINS, however the same name is given to an entity which is actually an 
officer's assignment to a given billet, at a given unit. Homonym conflicts such 
as this usually arise because of inadequate specificity of the na min g conven- 
tions employed. In this case, the UNIT entity in OPINS should be completely 
renamed as DUTY_STATION_BILLET_ASSIGNMENT to better reflect its 
intended meaning, with only those attributes which relate to an officers 
assignment to particular billets, current and historical. Remaining attributes 
of the OPINS UNIT entity which deal with the specific unit currently assigned 
will be included with the global unit entity. 

It is appropriate here to mention the concept of orgemiza- 
tionally standardized Fully Qualified Naming (FQN), on which much effort 
has been expended recently. FQN certainly reduces, and seeks to completely 
avoid, semantic conflict between data element names, and applies equally to 
entities and attributes. FQN specifies the semantic meaning of a data element 
in sufficient detail that confusion between merely similar elements is 
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eliniinated. Applied to the homon 3 on example above, FQN would result in a 
name such as that suggested for OPINS, which more accurately indicates the 
semantic function which that entity fulfills (a record of an officer's billet 
assignments, and not simply information about the unit currently assigned 
to). Similarly, FQN for ADMI would result in a name closer to CURRENT. 
UNIT_ASSIGNMENT. This is a very over simplified treatment of the theory of 
Fully Qualified Names, and is included only to illustrate the current thrust of 
standardization efforts and of conventions and procedures available for 
resolution of this kind of conflict. Whatever approach is taken, the global 
controller’s comprehensive definition includes mapping tables to allow 
decomposition of queries against global schema names back to the component 
databases. 

(2) Entity Structure Conflicts. An example of missing attri- 
butes is found in UNIT which in the OPINS database does not include an 
attribute for the unit Zip code, while UNIT in the ADMI does. Likewise, 
LANGUAGE in the ADMI has an attribute for listening proficiency, while 
LANGUAGE in OPINS contains an attribute for writing skill. The resolution of 
missing attribute conflicts is simple. The union of attribute sets is taken for 
equivalent entities, which ensures that all originally available data is 
included in the global schema. In resolving one conflict, however, the intro- 
duction of new conflicts should be avoided. This possibility is exemplified by 
the technique of taking unions of different attribute sets, which solves miss- 
ing attribute problems, but which may raise new overlapping conflicts. 

Overlapping attributes are found in MEMBER-OFR 
(IMAPMIS), which does not contain the member’s name, contrasted with 
COMMISSIONED.OFFICER (OPINS), which does. This is due to the 
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Generalization/Specialization relationship of MEMBER-OFR to MEMBER in 
IMAPMIS. The member's name is represented by an attribute of MEMBER. 
This is solved by decomposing COMMISSIONED_OFFICER into a General- 
ization/Specialization structure, segregating the appropriate attributes which 
apply to each part of the relationship. Choosing among remaining redundant, 
or overlapping attributes after this entity structure conflict is resolved 
requires more analysis. 

Where two or more attributes from different databases 
represent truly equivalent data elements, the attribute with the most fully 
detailed name, definition, and accuracy, or a stemdardized data element, if 
available, should be chosen, and the redundant attributes excluded from the 
global schema. Returning to the example of overlapping attributes above, the 
NAME attribute from IMAPMIS would be the choice, since its specification is 
more semantically rich than either of the other two name attributes. The 
global controller needs in its detailed definition the appropriate look-up tables 
to match the chosen global entity to the corresponding attributes in the com- 
ponent databases. In this case, the more fully detailed choice is intuitive to 
the user, since the detailed definition of the IMAPMIS NAME attribute sub- 
sumes the definitions of the other two with no loss of meaning. 

(3) Entity Constraint Conflicts. When the cardinality of rela- 
tionship between two entities varies across two or more schemas, it is termed 
an entity constraint conflict. This is shown by the one-to-one relation between 
UNIT and ACTIVE_DUTY_MILITARY_MEMBER in the ADMI, as opposed to the 
one-to-many relation between UNIT and COMMISSIONED.OFFICER in the 
OPINS. As mentioned, this conflict results from the fact that UNIT in the 
OPINS database does not represent an equivalent entity to UNIT in the ADMI 
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database. This is an example of interdependency of conflicts, where one t3rpe 
of conflict causes another conflict of a different type. In this case, solving one 
(renaming the OPINS UNIT) will also resolve the other. But as seen in the case 
of missing attributes under entity structure, an imcritical, isolated approach 
to resolution of semantic conflicts can become a circular problem. The resolu- 
tion of one type results in new instances of a different type of conflict. For a 
true cardinality or participation constraint conflict, the re-engineering ana- 
lyst needs to use the constraint that reflects the actual semantics of the 
application area of interest. Further research into this area of resolution 
strategy is suggested. 

h. Attribute Level Conflicts 

Resolution of attribute level conflict covers the same concep- 
tual range as the entity level. Appropriate renaming, and inclusion/ 
elimination of missing or overlapping attributes can successfully deal with 
naming and structure conflicts. 

(1) Attribute Name Conflicts. Like entity name conflicts, 
these comprise s3monyms and homonyms. Samples of attribute s)mon3nns 
from the databases of interest are DESIGNATOR (OPINS) and DESIG 
(IMAPMIS), as well as ORIGINAL_SOURCE_CODE (OPINS) and SOURCE_OF_ 
COMMISSION (ADMI). These both illustrate identical real world facts called by 
different names. 

An examples of homon3rms is UNIT_IDENTIFICATION_ 
CODE (ADMI) contrasted to UNIT_IDENTIFICATION_CODE (OPINS). These two 
identically named attributes represent different real world facts. The ADMI 
defines the unit identification code as an 8-character code which captures 
Department of Defense wide unit identification, while the same attribute in 
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the OPINS is defined as a standard 5-character Navy Unit Identification Code 
(UIC). The global controller will have look-up tables to allow mapping between 
global and component attributes in an identical manner to that discussed in 
the above section on entity conflicts. 

(2) Attribute Structure Conflict. This is illustrated by RACE_ 
ETHNIC in ADMI, and the two attributes RACE and ETHNIC in the OPINS. This 
case demonstrates a single attribute to multiple attribute structure conflict. 
Alternatively, the real world value of an officer's warfare designator is 
represented by the atomic attribute DESIG in the OPINS, while the IMAPMIS 
database breaks this information down into DESIG-CAT and DESIG-STAT, 
which themselves are part of the composite attribute DESIG. The suggested 
resolution strategy for attribute structure conflict is to capture the available 
information at the finest granularity (i.e., using the largest number of 
attributes). If RACE and ETHNIC contain the same data as RACE.ETHNIC, 
then the global controller will decompose that query into the two atomic 
attributes. The same holds true for the designator information. In this way, 
no data is lost, and the additional flexibility to manipulate the available 
information in useful ways is gained over using the single combined attribute. 

(3) Attribute Constraint Conflicts. An example of type clash 
is give by SOCIAL.SECURITY.NUMBER (ADMI), which is defined as a 4-byte 
packed integer, while the identical information is defined as a 9-character 
string for SSN (OPINS). The two Year/Month/Day attributes DATE_OF_BIRTH 
(ADMI) and DOB (IMAPMIS) are similarly mismatched, as the first is stored as 
a 3-byte packed integer, and the second as a 6-character string. 

Allowable value, or range, clash, is also illustrated by the 
two date attributes just noted. In the IMAPMIS, the member's date of birth is 
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defined as having a value between January 1, 1900 and December 31, 1999, 
An incompatible remge is defined for the ADMI, since the date of birth in this 
database can take on any 6-digit value which corresponds to a valid date (in 
other words, the date is only constrained to be a date, and could represent a 
value outside that allowed for the same date in the IMAPMIS). 

To resolve both tsqje and range clashes, the global schema 
attribute is redefined to subsume the definitions of the conflicting attributes. 
This strategy is a very rough rule of thumb at best, since it invites instances 
of inconsistent data, discussed below. The global controller will have to per- 
form the translation and comparison functions described below to deed with 
the potential inconsistency. 

c. Entity Attribute Conflicts 

An example of this is the member’s security clearance informa- 
tion, which in the OPINS and IMAPMIS databases is represented as separate 
entities; SCRTY-CLEAR in the IMAPMIS, and SECURITY.CLEARANCE in the 
OPINS. The equivalent real world information (though less detailed) is stored 
by the ADMI as the composite attribute SECURITY_INVESTIGATION, and the 
atomic attribute SECURITY.CLASSIFICATION, both belonging to the ACTIVE. 
DUTY_MILITARY_MEMBER entity. Resolution of this type of semantic conflict 
proceeds by removing the appropriate attributes from the entity they describe 
in the separate database, and migrating them to the separate entity in the 
global schema. (This approach assumes that the global schema will always 
represent at a minimum the sum of independent entities from the separate 
databases, taking equivalence mapping into account. The global controller 
knows where to find the equivalent information among the component 
databases, even when the individual schemas present that information at 
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different conceptual levels of organization. Thus there would be no case in 
which the attributes of an existing entity would be migrated to a higher order 
entity in the global schema. This is consistent with the basic philosophy of 
representing data in the global schema at the finest possible granularity.) 

d. An Integrated Global EER Schema For Three Personnel 
Databases 

Applying the heuristics and suggested resolution strategies 
listed above results in a global EER schema for the Active Duty Military 
Inventory, Officer Personnel Information System, and Inactive Manpower 
And Personnel Management Information System databases. This schema can 
then be used to guide the formulation of queries against the total original 
volume of data available across all three databases. Figure 20 shows the 
completed global EER schema. 

2. Data Level Conflicts 

Data level conflicts, which include inconsistencies, and data repre- 
sentation conflicts, present a much more difficult resolution problem. Often 
the only choice is to go back to the user, or recapture the original data from 
domain of interest. These conflicts only arise when data is returned from a 
query against the federated database. The global controller must be imple- 
mented with a capacity to deal with the extraction, conversion, comparison, 
and resolution of these data level conflicts. The following heuristics can be 
applied to the design of the global controller, but with the understanding that 
they are by no means assured of correct results. 
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Figure 20. Integrated Global EER Schema for ADMI, OPINS and IMAPMIS 
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а. Inconsistencies 

A simple, and obvious example of an inconsistency is the ADMI 
database returning a PAY_GRADE of 4, corresponding to 04, or Lieutenant 
Commander, for a given commissioned officer, while the OPINS returns a 
value of 3 for the attribute PRESENT_GRADE, indicating a rank of Lieutenant. 
A heuristic would be to accept the data from the database with the most 
recent update. This will not guarantee accuracy, but offers a simple and low 
effort approach. Alternatively, other data might be available to cross-verify 
and resolve the conflict (i.e., pay information might correspond to one rank 
and not another, or the DOR could be checked against years of commissioned 
service, to see if one rank was irrational). An important qualification of this 
second alternative is that it would be processing based, as opposed to a 
schematic resolution. This processing would be included in the detailed 
implementation of the global controller. 

б. Data Representation Conflicts 

Dissimilar expressions can often only be resolved by accepting 
data values from eill heterogeneous databases queried by the global schema, 
and deciding by inspection whether the information is equivalent, and which 
value to accept. Alternatively, an automatic resolution might be built into the 
global controller. Such a solution would have to depend on large and ineffi- 
cient look-up tables covering literally every conceivable expression which 
could represent the equivalent information of interest. This is because 
expression conflicts cover such a broad spectrum of possibilities, and can arise 
when no other classifiable conflicts are known or expected. Additionally, such 
a mapping scheme would necessarily be dynamic, since each new user verified 
instance of an equivalent, though conflicting, expression would have to be 
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included for future reference. Dissimilar units, and dissimilar precisions 
admit to some general rules of thumb for resolution which are noted below. 

(1) Dissimilar Expressions. An example of this is ACTIVITY_ 
TITLE, a character attribute which in OPINS represents the UNTT's text name, 
such as 'COMSURFRON THREE’. Contrast this to the attribute ACTY-LANG- 
NAME, a character attribute, which IMAPMIS uses for the same information. 
The actual string stored in this attribute for the equivalent unit might be 
'Cmdr, Sfc Sqdrn 3'. Bearing in mind that the more richly defined attribute 
was suggested above for inclusion in the global schema (in this case ACTY- 
LANG-NAME) the expression conflict would arise when the value from OPINS 
was returned and clashed with that from IMAPMIS. Further research is 
required to resolve this kind of conflict short of post query inspection and 
addition of verified equivalent representations to the look-up table, since it is 
a result of purely subjective choice as to appropriate content. 

(2) Dissimilar Units. This is illustrated by the UNIV-DUR 
attribute, in the IMAPMIS database, which represents a 2-character value for 
the length of an officer's course of instruction in weeks. On the other hand, 
the DURATION attribute in the OPINS is also two characters, but represents 
the length of a course of instruction in months. This kind of data conflict is 
amenable to the FQN approach mentioned above, since one would be repre- 
sented as UNIV-DUR-IN-WEEKS, with the other as DURATION_IN_MONTHS. To 
resolve this conflict in the context of the global controller for a federated 
database, each value would be retrieved, based on a (possibly) user-defined 
query in a given vinit. The controller would accept both values, translate them 
into a common unit, compare them for consistency, and return the informa- 
tion to the user in the requested units. It is interesting to note that if the 
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values still conflict after trainslation, the conflict becomes an inconsistency, 
rather than a dissimilar units conflict. Additionally, in this specific case, the 
only time an inconsistency will arise is when the absolute values returned 
from the component databases match, since the original conflict is due to 
their difference in definition. 

(3) Dissimilar Precisions. This type of data level conflict is 
shown by READING_PROFICIENCY from the ADMI database. This 1-character 
attribute is constrained to the numeral values of zero through nine, with nine 
being defined as excellent, and zero as unacceptable, with eight gradations 
completing the allowable values. The OPINS definition for SKILL_READ, how- 
ever, while it is also a 1-character attribute, groups the allowable ten 
numeral range into four sub ranges, from zero-one meaning poor, to eight- 
nine meaning outstanding. In this case, the attribute definition with the finer 
granularity should be chosen for the global schema (to capture all available 
information), and during retrieval the less precise attribute values mapped 
onto that scale by means of a look-up table. If after this mapping, the values 
from the two databases still do not agree, the conflict devolves to an 
inconsistency, as noted above. 
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VII. CONCLUSIONS AND RECOMMENDATIONS 



A. CONCLUSIONS 

Analysis of several independently developed and maintained real world 
databases from the same functional area shows that the expected heterogene- 
ity does exist. Three levels of heterogeneity can be recognized; platform, 
DBMS, and semantic. Of these three, much effort has gone into resolving the 
technical problems of making a global query against databases of fundamen- 
tally different organization. Problems such as formulating a relational state- 
ment that can be processed by a CODASYL based DBMS admit to technical 
solutions. While this type of research addresses platform and DBMS hetero- 
geneity, there is still an urgent need to identify and resolve semantic con- 
flicts, or differences in the meaning of information stored in existing diverse 
databases. 

To effectively identify and classify all types of semantic heterogeneity, 
data organizations must be expressed in a common schema. The Enhanced 
Entity Relationship model is an appropriate one for forming an integrating 
schema of heterogeneous databases. Because it is semantically rich, and has 
foimd wide use in initial design of databases (whatever their finsd implemen- 
tation), it is a useful model for reverse engineering existing applications and 
transforming them into equivalent schemas. 

By systematically comparing different schemas in the common model, 
the various types of semantic conflict are identifiable, and can be usefully 
grouped in a framework. A large part of the semantic conflicts found result 
from arbitrary and undisciplined application of naming conventions and data 
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definitions during the original design. This framework represents a powerful 
methodological tool for the analysis of any set of heterogeneous databases 
which are expressed in the EER model. 

The major weakness noted in this process is the difficulty of correctly 
capturing the original users' intentions regarding relationship constraints 
and cardinalities. It is felt that this is due to the fact that although relation- 
ship constraints and cardinalities are explicitly represented in an EER 
schema (such as one resulting from an initial top down design effort), these 
constraints are usually enforced at the implementation level through 
procedures rather than being captured in the schema itself. It is unclear that 
any level of database description available to the re-engineering analyst, 
short of a detailed source code listing of the actual application, will allow the 
original relationship constraints to be conceptually modeled with complete 
accuracy. 

The process of exploring possible solutions to the various types of seman- 
tic conflict reveals that a wide spectrum of techniques apply. Some resolu- 
tions are simple, such as renaming and associated look-up tables, and provide 
certainty of a correct solution. Other types of semantic conflict are extremely 
difficult to resolve, particularly data inconsistencies. While recourse to the 
user, or re-examination of the real world information, will certainly deal with 
these problems, a more complete theoretical approach should be pimsued. 

B. RECOMMENDATIONS 

Current Department of Defense efforts to institute Fully Qualified 
Naming (FQN) principles show great promise for eliminating many types of 
semantic conflict identified herein. FQN should be fully enforced for all new 
Department of Defense database applications. 
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FQN, however, will primarily benefit newly designed databases. There 
remains a need for an integrating model to support the integration of existing 
heterogeneous databases and the resolution of semantic conflict. This inte- 
grating model should be semantically rich enough to subsume the conceptual 
organizations of old and new databases. The Department of Defense should 
designate a suitable conceptual data model to be used in all efforts to inte- 
grate existing heterogeneous databases, and develop or procure the support- 
ing tools to facilitate integration using the common conceptual model. 

C. FUTURE RESEARCH EFFORTS 

FQN will not solve the problems of semantic heterogeneity in existing 
databases (short of complete redesign). Therefore, further research is sug- 
gested in the area of general solutions to resolving the types of semantic con- 
flict identified by the classification framework. In particular, interdependen- 
cies of conflicts, some of which were noted in the course of this analysis, 
should be more rigorously investigated. Efforts to resolve semantic conflict 
would benefit greatly from a framework similar to the one presented here, 
which could enumerate various interdependencies, and provide assured ways 
of resolving each, without introducing new conflicts. 

Additional research is also warranted in the field of reverse engineering 
and the development of conceptual models for existing implementations. For 
example, determining relationship constraints and cardinalities from existing 
specifications. The ability to accurately capture this semantic content without 
recourse to a detailed analysis of DBMS processing algorithms would greatly 
enhance the usefulness of the bottom-up integration strategy suggested by 
this thesis. 
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APPENDIX D 

ADMI ENTITY STRUCTURE 



Active Duty Master Inventory (ADMI) 

ACTIVE_DUTY_MIUTARY_MEMBER = 

SOCIAL_SECURITY_NUMBER ^ey) 
+ NAME (comp) 

+ DATE_0F_BIRTH (comp) 

+ SEX 

+ RACE.ETHNIC 
+ ETHNIC.GROUP 
+ PAY_ENTRY_BASE_DATE (comp) 

+ SERVICE 
+ MOS (comp) 

+ DATE_OF_CURRENT_RANK(comp) 

+ PAYGRADE 

+ SECURITY.CLASSIFICATION 
+ SECURITYJNVESTIGATION (comp) 
+ EDUCATION (comp) 



NAVAL_SERVICE_MEMBER = 

+ NAVAL_SECURITY_INVESTIGATION_TYPE 
+ SERVICE.SPECIFIC 



NAVAL.OFFICER = 

+ YEARS_OF_COMMISSIONED_SERVICE 
+ SOURCE_OF_COMMISSION 



UNIT = 

UNITJDENTIFICATION.CODE (key) 
+ DUTY.LOCATION 
+ UNIT_ZIP_CODE 
+ MAJOR_COMMAND_CODE 
+ PROGRAM_ELEMENT_CODE 
+ STRENGTH_ACCOUNTING_STATUS 
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LANGUAGE = 

+ IDENTITY (pkey) 

+ DATE_LAST_TESTED (comp) 
+ LISTENING.PROFICIENCY 
+ SPEAKING.PROFICIENCY 
+ READING.PROFICIENCY 
+ PROFICIENCY.SOURCE 
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APPENDIX E 

OPINS ENTITY STRUCTURE 



Officer Personnel Information System (OPINS) 

COMMISSIONED.OFFICER = 

SSN (key) 

+ NAME (comp) 

+ DATE_OF_BIRTH (comp) 

+ SEX 
+ RACE 
+ ETHNIC 
+ PEBD (comp) 

+ ACTIVE_COMMISSION_BASE_DATE (comp) 

+ DESIGNATOR 
+ DOR (comp) 

+ PRESENT.GRADE 
+ ORIGINAL_SOURCE_CODE 



UNIT = 

UNIT_IDENTIFICATION_CODE (comp) (sub-attribute is key) 
+ HOMEPORT 
+ TYPE.ASSIGNMENT 
+ ACTIVITY.TITLE 
+ BILLET_SEQUENCE_NUMBER Gcey) 

+ DATE.ASSIGNED (comp) (key) 

+ FROM (comp) 

+ TO (comp) 



LANGUAGE.SKILL = 

+ CODE (pkey) 

+ PROFICIENCY.YEAR 
+ METHOD (comp) 

+ SKILL (comp) 



SECURITY.REQUIREMENT = 

SECURITY.CODE (pkey) 

+ SECURITY.AGENCY 
+ SECURITYJNVESTIGATION.DATE (comp) 
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YEAR.GROUP = 

YEAR_GROUP_ID (comp) (KEY) 

+ PROJECTED_AUTHORIZED_PROMOTION_DATE (comp) 
+ PROJECTED_AUTHORIZED_PROMOTION_GRADE 



EDUCATION = 

COLLEGE.NAME (pkey) 

+ YEAR.COMPLETED (pkey) 
+ LEVEL 
+ DURATION 
+ MAJOR 
+ SPECIALTY 
+ SPONSOR 
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APPENDIX F 

IMAPMIS ENTITY STRUCTURE 



Inactive Manpower And Personnel Managment Information Sysytem 
(IMAPMIS) 

MEMBER = 



SSN 


(key) 


+ 


MEMBER-NAME (comp) 


-f 


DOB (comp) 


-f 


SEXC 


■f 


RACE 


+ 


ETHN 




PEBD (comp) 




OFCR-ENL-IND 



MEMBER-OFR = 

+ GRD-CD 
+ SRCE-ORIG (comp) 

+ DT-PRMTN (comp) 

+ DESIG (comp) 

+ BASE-DT-CMSN-SVC (comp) 



EDUC = 

+ UNIV-NAME (pkey) 

+ UNIV-DT-CMP (pkey) 
+ UNIV-SPNSR 
+ UNIV-DUR 
+ UNIV-LVL 
+ UNIV-MAJ 
+ UNIV-SPEC 



AATY = 

+ ACTIV-UIC (key) 

+ ACTY-LANG-NAME 
+ GEOGRAPHIC-LOG 
+ UNIT-ADRS (comp) 

+ AATY-ATC 
+ PROG-ELEMENT-CD 
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SCRTY-CLEAR = 

+ SCRTY-INVST-DT (pkey) (comp) 
+ SCRTY-AGCY 
+ SCRTY-INVST-TYPE 
+ SCRTY-CLR-AUTH 
+ SCRTY-CLR-AUTH-DT (comp) 



LANG = 

+ LANG-ID (pkey) 

+ LANG-APRSL (comp) 

+ LANG-METH-APRSL (comp) 
+ LANG-DT-TESTED (comp) 
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APPENDIX G 

ADMI DATA DEFINITIONS 



Active Duty Master Inventory ADMI 



Attribute Type Length Key Range 

ACriYE_DUTY_MILrrARY_MEMBER Entity 
SOCIAL_SECURITY_ 

NUMBER NI 4 Y 0-9 

**Member's Social Security Number (in 4 byte packed numeric format). 



NAME 


C 


27 


-LAST 




15 


-FIRST 




11 


-MIDDLE 




1 



**Member's full name (in Last, First, MI (including ","s) format). 



A...Z 



Constraint 



Mandatory 



DATE_OF_BIRTH NI 3 N 0-9 

**Member's date of birth (in YYMMDD 3 byte packed numeric format). 



SEX NI 1 

**Member's sex (l=Male, 2=Female). 


■N 


1,2 


RACE.ETHNIC C 1 

**Member's Race (Caucasian, African, etc.). 


N 


C,M,A,I,H 


ETHNIC.GROUP C 1 

**Member's ethnic group (special code). 


N 


A-Z, 0-9 


PAY_ENTRY_ 







BASE.DATE NI 3 N 0-9 

**Member's start date for calculation of time in service for pay purposes (in YYMMDD 3 byte 
packed numeric format). 



SERVICE NI 1 N 1,2, 3, 4 Mandatory 

**Member's service (l=Army, 2=Air Force, 3=Navy, 4=Marines). Defining attribute of 
SERVICE-MEMBER Specialization. 

MOS C 14 N A-Z, 0-9 

-PRIMARY 7 

-DUTY 7 

**Member's Military Occupational Specialty (code, both the MOS gained by training 
(Primary), and for the current assignment (Duty)). 
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Attribute 


Type 


Length 


Key 


Range 


Constraint 


DATE.OF.CURRENT. 

RANK 


NI 


2 


N 


0-9 





♦^Member's date of promotion to current pay grade (in YYMM 2 byte packed numeric 
format). 

PAYGRADE NI 1 N 0-9 

**Member's current paygrade (l=El/Ol, 2=E2/02, etc., covering grades for Naval Officers 

from Ensign (01) to Admiral {OlO}). 

SECURITY. 

CLASSIFICATION NI 1 N 0-9 

**Member's security clearance (0=None, l=Classified, 2=Secret, 3=Top Secret, etc.). 

SECURITY. 

INVESTIGATION NI 3 N 0-9 

-TYPE 1 

-DATE.OF. 

COMPLETION 2 

**Type of security investigation completed for member (0=None, l=National Agency Check, 
2=Background Investigation, 3=Special Background Investigation, etc.), and date on which it 
was completed (in YYMM 2 byte packed numeric format). 

EDUCATION NI 3 N 0-9 

-CODE 1 

-CERT 1 

-HIGHEST.YEAR 1 

**Member's educational data, including code for college level courses (0=No, l=Yes), 
certification of High School completion (0=No, l=Yes), and highest year of schooling 
completed (in 1 byte packed numeric format). 



NAVAL.SERVICE.MEMBER Entity 
NAVAL.SECURITY. 

INVESTIGATION. 

TYPE C 2 NO-9 

**Special security investigation information required for Naval service members (field is null 
for other services). 

SERVICE.SPECIFIC C 2 NO-9 

**Meaning of attribute varies according to member’s service. 

NAVAL.OFFICER Entity 
YEARS.OF. 

COMMISSIONED. 

SERVICE NI 1 N 0-9 

**Officer's total years of commissioned military service (in 1 byte packed numeric format). 
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Attribute 


Type 


Length 


Key 


Range 


Constraint 


SOURCE.OF. 

COMMISSION 


NI 


1 


N 


0-9 





**Officer's commissioning source (0=Service Academy, l=ROTC, 3=Officer Candidate School, 
etc.). 



UNIT Entity 
UNIT. 

IDENTIFICATION. 

CODE C 8 Y A-Z, 0-9 Mandatory 

♦♦Department of Defense 8 digit Unit Identification Code (includes 3 digit service/component 
identification, plus 5 digit std Navy UIC) 

DUTY.LOCATION NI 1 N 0-9 

♦♦Unit's geographic location (0=Continental US, l=Europe, 2=Japan, 3=Middle East, etc.). 

UNIT.ZIP.CODE C 5 NO-9 

♦♦Unit's 5 digit postal Zip code. 

MAJOR. 

COMMAND.CODE NI 3 N 0-9 Mandatory 

♦♦Unit's assignment to major force command (121=CINCPACFLT, 333=USAREUR, 
542=Sixth FLT, etc.). 

PROGRAM. 

ELEMENT.CODE CON A-Z, 0-9 Mandatory 

♦♦Unit's budgetary funding program element code. 

STRENGTH. 

ACCOUNTING. 

STATUS NI 1 N 0, 1 Mandatory 

♦♦Unit's is required to continuously report total percentage of authorized end stregth (0=No, 
l=Yes). 



LANGUAGE Entity 

IDENTITY Cl P A-Z Mandatory 

♦♦Foreign language (F=French, R=Russian, M=Mandarin, A=Arabic, etc.). 

DATE.LAST. 

TESTED NI 2 N 0-9 

♦♦Date on which the language proficiency was last tested (in YYMM 2 byte packed numeric 
format). 

LISTENING. 

PROFICIENCY NI 1 N 0-9 

♦♦Level of apptitude in listening comprehension for a foreign language (0=Unacceptable, 
l=Very Poor, 2=Poor, 3=Below Average, 4=Average, 5=Above Average, 6=Good, 7=Very Good, 
8=Excellent, 9=Fluent). 
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Attribute 


Type 


Length 


Key 


Range 


Constraint 


READING. 

PROFICIENCY 


NI 


1 


N 


0-9 





**Level of apptitude in Reading comprehension for a foreign language (0=Unacceptable, 
l=Very Poor, 2=Poor, 3=Below Average, 4=Average, 5=Above Average, 6=Good, 7=Very Good, 
8=Excellent, 9=Fluent). 

SPEAKING. 

PROFICIENCY NI 1 N 0-9 

**Level of apptitude in speaking comprehension for a foreign language (0=Unacceptable, 
l=Very Poor, 2=Poor, 3=Below Average, 4=Average, 5=Above Average, 6=Good, 7=Very Good, 
8=Excellent, 9=Fluent). 

PROFICIENCY. 

SOURCE NI 1 N 0-9 

**Source of the proficiency ratings for Listening, Speaking and Reading (0=Assessment by 
supervisor on duty, l=Local Test, 2=Formal language school, 4=Defense Language Institues, 
etc.). 
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APPENDIX H 

OPINS DATA DEFINITIONS 



Officer Personnel Information System OPINS 

Attribute Type Length Key Range Constraint 

COMMISSIONED.OFFICER Entity 

SSN NI 9 Y 0-9 Mandatory 

**Member's Social Security Number. 



NAME 


C 


27 


-LAST 




16 


-FIRST 




10 


-MIDDLE 




1 



**Member's full name (in Last , First , MI format, including spaces). 

DATE_OF_ 

BIRTH NI 6 N 0-9 

**Member's date of birth (in YYMMDD format). 

SEX C 1 N M, F 

**Member’s sex. 

RACE Cl N A-Z 

**Member's race (C=Caucasian, N=Negroid, H=Hispanic, etc.). 

ETHNIC Cl N A-Z 

**Member's ethnic group (arbitrary code, A=North European, B=Canadian, C=East 
European, etc.). 

PEBD NI 6 N 0-9 

**Member's pay entry base date (in YYMMDD format). 

ACTIVE. 

COMMISSION. 

BASE.DATE NI 6 N 0-9 

**Member's starting date of commissioned service (in YYMMDD format). 

DESIGNATOR NI 4 N 0-9 Mandatoiy 

**Officer's warfare designator (1110=Active Duty Surface Warfare, 1115=Reserve Surface 
Warfare, etc.). 

DOR NI 6 N 0-9 

**Member's date of present grade (in YYMMDD format). 
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Attribute 


Type 


Length 


Key 


Range 


Constraint 


PRESENT.GRADE 


NI 


1 


N 


0-9 


Mandatory 



**Officer's current paygrade (1=01, 2=02, etc., covering grades from Ensign {01} to Admiral 
(OlO)). 

ORIGINAL. 

SOURCE.CODE Cl N A-Z 

**Officer's original commissioning source (A=Naval Academy, R= Reserve Officer Training 
Corps, 0=0fficer Candidate School, etc.). 



UNIT Entity 
UNIT. 

IDENTIFICATION. 

CODE NI 10 N 0-9 Mandatory 

-ACTUAL.UIC 5 K 

-PARENT.UIC 5 Y 

**Navy 5 digit unit identification code for both unit assigned, and Immediate Superior in 
Command (ISIC) of that unit. 

HOMEPORT C 6 N A-Z 

**Plain language name (or abbreviation) of unit’s assigned homeport. 

TYPE. 

ASSIGNMENT NI 1 N 0-9 

**Unit's duty type assingment (0=Sea, l=Continental US, 2=0verseas, etc.). 

ACTIVITY.TITLE C 16 N A-Z, 0-9 

**Unit's plain language title (or abbreviation). 

BILLET.SEQUENCE. 

NUMBER NI 5 K 0-9 

**Specific duty assignment by billet number (12345=Commanding Officer, 67890=Executive 
Officer, etc.). 

DATE. 

ASSIGNED NI 4 K 0-9 

**Date assignment was made to the specific duty billet (in YYMM format). 

FROM NI 4 N 0-9 

**Date the specific duty billet the specific duty billet assignment was assumed (in YYMM 

format). 

TO NI 4 N 0-9 

**Date the specific duty billet assignment was vacated (in YYMM format). 
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Attribute TyP® 


Length 


Key 


Range 


Constraint 


LANGUAGE.SKILL Entity 


CODE NI 

Mandatory 


2 


P 


0-9 





**Foreign language (01=Spanish, 43=French, ll=Russian, 24=Arabic {Iraqi dialect), 
52=Farsi, etc.). 



PROFICIENCY. 

YEAR NI 2 N 0-9 

**Year in which proficiency in a language was most recently tested (in YYMM format). 



METHOD 

-COMP 

-READ 

-WRITE 

-SPEAK 



C 4 NO-9 

1 
1 
1 
1 



**Method used to appraise the level of aptitude in comprehension, reading, speaking, 
writing, and speaking a langiiage (0=Assessment by supervisor on duty, l=Local Test, 
2=Formal language school, 4=Defense Language Institute, etc.). 



SKILL C 4 

-COMP 1 

-READ 1 

-WRITE 1 

-SPEAK 1 



N 0-9 Mandatory 



**Level of apptitude in comprehension, reading, speaking, writing, and speaking a language 
(0-l=Poor, 2-4=Average, 5-7=Good, 8-9=Outstanding). 



SECXJRITY.REQUIREMENT Entity 
SECURITY. 

CODE NI 1 P 0-9 Mandatory 

**Level of security classification for which investigation requirements have been completed 
(0=None, l=Classified, 2=Secret, 3=Top Secret, etc.). 

SECURITY. 

AGENCY C 6 N A-Z 

♦♦Agency abbreviation which completed security investigation NVINSV=Naval Investigative 
Sevice, DFINSV=Defense Investigative Sevice, FDBUIN=Federal Bureau of Investigation, 
CTINAY=Central Intelligence Agency, etc.). 

SECURITY. 

INVESTIGATION. 

DATE NI 6 N 0-9 

♦♦Date of completion of security investigation (in YYMMDD format). 
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Attribute 


Type 


Length 


Key 


Range 


Constraint 


YEAR GROUP Entity 
YEAR.GROUP.ID 


NI 


3 


Y 


0-9 


Mandatory 



-YEAR 2 

-SPLIT 1 

**Promotian year group (in YY , plus (0=No split, l=Split, lower half, 2=Split upper half) 
format). 

PROJECTED. 

AUTHORIZED- 

PROMOTION. 

DATE NI 6 N 0-9 

**Prospective date of promotion to next higher rank for memebers of the year group (in 
YYMMDD format). 

PROJECTED. 

AUTHORIZED- 

PROMOTION. 

GRADE NI 1 N 0-9 

**Prospective next rank of memebers of the year group (1=01, 2=02, etc., covering grades 
from Ensign (01) to Admiral (010)). 



EDUCATION Entity 

COLLEGE.NAME C 10 P A..Z Mandatory 

♦♦Educational institution name (or abbreviation) 

YEAR. 

COMPLETED NI 2 P 0-9 Mandatory 

♦♦Year in which a course of education was completed (in YY format). 

LEVEL Cl N A-Z 

♦♦Level of course of education (U=Undergraduate, G=Graduate, P=Postgraduate). 

DURATION NI 2 N 0-9 

♦♦Duration (in months) of course of education. 

MAJOR NI 2 N 0-9 

♦♦Academic major (12=Oceanography, 43=Aeronautical Engineering, 55=Electrical 
Engineering, etc.). 

SPECIALTY NI 2 N 0-9 

♦♦Naval warfare specialty associated with course of education (24=Surface Warfare, 

55=Antisubmarine Warafre, 87=Anitair Warfare, etc.). 

SPONSOR NI 1 N 0-9 

♦♦Navy organization which sponsored course of education (3=Op-03, 4=Op-04, 8=Op-8, etc.). 
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APPENDIX I 

IMAPMIS DATA DEFINITIONS 



Inactive Manpower And Personnel Managment Information Sysytem 
IMAPMIS 



Attribute Type 


Length 


Key 


Range 


Constraint 


MEMBEIR Entity 

SSN C 

**Member’s Social Security Number. 


9 


Y 


0-9 


Mandatory 


MEMBER-NAME C 27 

-SURNAME 13 

-FIRST 7 

-MIDDLE 5 

-POSITION 2 

♦♦Member's full name (includes JR., SR, 2, 3, etc). 


N 


A..Z, 1-9 




DOB C 6 

♦♦Member’s date of birth (in YYMMDD format). 


N 


0-9 


btwn 01/01/00 
and 01/01/99 


SEXC C 

♦♦Member's sex 


1 


N 


M/F 




RACE C 1 

♦♦Member's Race (Caucasian, African, etc.). 


N 


C,M,A,I,H 




ETHN C 

♦♦Member's ethnic group (code). 


1 


N 


0-9 




PEBD C 6 N 

♦♦Member's Pay Entry Base date (in YYMMDD format). 


0-9 


btwn 01/01/00 
and 01/01/99 


OFCR-ENL-IND C 

♦♦Member's Officer/Enlisted status. 


1 


N 


0,E 


Mandatory 


MEMBER-OFR Entity 
GRD-CD C 

♦♦Officer's present rank (01-09). 


1 


N 


0-9 


Mandatory 
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Attribute 


Type 


Length 


Key 


Range 


Constraint 


SRCE-ORIG 


C 


3 


N 


a..Z, 0-9 




-SRCE-CD 




2 








-SRCE-STAT 




1 








♦♦Officer's original commissioning source and active/reserve status (Code). 




DT-PRMTN 


C 


6 


N 


0-9 


btwn 01/01/00 
and 01/01/99 


♦♦Member's date of Rank (in YYMMDD format). 








DESIG 


c 


10 


N 


0-9 




-DESIG-STAT 




1 








-DESIG-CAT 




3 








-DESIG-DT 




6 









♦♦Officer's warfare designator (3 digit specialty, and 1 digit active/reserve ind) and date of 
award of designator (in YYMMDD format). 



BASE-DT-CMSN- 

SVC C 6 NO-9 btwn 01/01/00 

and 01/01/99 

♦♦Member s date of commencement of commission service (in YYMMDD format). 



EDUC Entity 



UNIV-NAME 
♦♦Educational institution 


C 

name 


10 

(or abbreviation) 


P 


A..Z 


Mandatory 


UNIV-DT-CMP 


C 


2 


P 


0-9 


Mandatory 



♦♦Year in which a course of education was completed (in Yy format). 

UNIV-SPNSR C 1 N A-Z, 0-9 

♦♦Navy organization which sponsored course of education (3=0p-03, 4=Op-04, 8=Op-8, 

C=Navy Comptroller, P=Bureau of Naval Personnel, R=Chief of Naval Reserve Force, etc.). 

UNIV-DUR C 2 NO-9 

♦♦Duration (in weeks) of course of education. 

UNIV-LVL C 1 N U,G,P 

♦♦Level of course of education (U=Undergraduate, G=Graduate, P=Postgraduate). 

UNIV-MAJ C 2 NO-9 

♦♦Academic major (OC=Oceanography, AE=Aeronautical Engineering, EE=Electrical 
Engineering, etc.). 

UNIV-SPEC C 2 NO-9 

♦♦Naval warfare specialty associated with course of education (SW=Surface Warfare, 

AS=Antisubmarine Warafre, AA=Anitair Warfare, etc.). 
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Attribute Type 


Length 


Key 


Range 


AATY Entity 

ACTIV-UIC C 


5 


Y 


0-9 


**Naval 5 digit Unit Identification Code 






ACTY-LANG-NAME C 

**Unit's plain language title. 


26 


N 


A-Z, 0-9 


GEOGRAPHIC-LOC C 

**Unit's geographic location (code). 


8 


N 


0-9 


UNIT-ADRS C 


59 


N 


A-Z. 0-9 


-UNIT-ADRS-STRT 


30 






-UNIT-ADRS-CITY 


18 







-UNIT-ADRS-STAT 2 

-UNIT-ADRS-ZIP 9 

**Unit's full mailing address (including 9 digit Zip code). 



Constraint 



Mandatory 



AATY-ATC C 3 NO-9 

**Unit’s area type code (Overseas, Conus, etc.). 



PROG-ELEMENT-CD C 8 N 

**Unit’s budgetary funding program element code. 



SCRTY-CLEAR Entity 

SCRTY-INVST-DT C 6 P 0-9 btwn 01/01/00 

and 01/01/99 

**Date of completion of security investigation (in YYMMDD format). 

SCRTY-INVST-TYPE Cl NO-9 

**Type of security investigation completed (Code). 

SCRTY-AGCY Cl NO-9 

**Agency which completed security investigation (code). 

SCRTY-CLR-AUTH C 1 N U,C,S,T 

**Level of security classification authorized as a result of the security investigation. 

SCRTY-CLR- 

AUTH-DT C 6 P 0-9 btwn 01/01/00 

and 01/01/99 

**Date on which security classification was authorized (in YYMMDD format). 



LANG Entity 

LANG-ID C 2 P 0-9 Mandatory 

**Foreign language (SP=Spanish, FR=French, RU=Russian, IQ=Arabic {Iraqi dialect}, 
FA=Farsi, etc.). 
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Attribute 



Length 



Constraint 



Type 



LANG-APRSL C 4 

-LANG-COMP 1 

-LANG-READ 1 

-LANG-WRITE 1 

-LANG-SPEAK 1 



Key Range 

N 0-9 Mandatory 



**Level of apptitude in comprehension, reading, speaking, writing, and speaking a language 
(0-l=Poor, 2-4=Average, 5-7=Good, 8-9=Outstanding). 



LANG-METH-APRSL C 4 

-LANG-METH-COMP 1 

-LANG-METH-READ 1 

-LANG-METH-WRITE 1 

-LANG-METH-SPEAK 1 



N 0-9 



**Method used to appraise the level of aptitude in comprehension, reading, speaking, 
writing, and speaking a language (Code). 



LANG-DT-TESTED C 6 NO-9 

**Date on which language aptitude was appraised (in YYMMDD format). 



btwn 01/01/00 
and 01/01/99 
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